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We  address  the  problem  of  estimating  the  parameters  of  a  signal 
embedded  in  noise.  The  signal  is  composed  of  samples  of  a  sum  of  M 
exponentially  damped  or  undamped  sinusoidal  signals.  That  is,  s(n)  the 
signal  samples,  are  given  by  the  formula  s(n)  _  L  esj<n  ,  where 

k=Vv 

{a^}  are  the  amplitudes  of  the  exponentials.  The  problem  is  to  estimate 
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(Abstract  con't.) 


the  value  of  M#  {a.  }  and  (s, }  from  a  short  record  of  the  data  samples 
which  are  corrupted  by  noise. 

We  suggest  improvements  to  the  original  approach  proposed  by  Prony. 

He  observed  that  the  samples  s(n)  obey  an  M  th  order  difference  equation 
and  that  from  the  roots  of  the  characteristic  polynomial  of  the  difference 
equation,  the  parameters  {s, }  can  be  determined.  We  present  three  methods 
to  improve  this  approach  when  the  signal  is  noise  corrupted.  The  two  key 
ideas  used  in  the  improvements  are  listed  be  lew  in  the  order  of  importance. 

(1)  By  using  a  higher  order  (greater  than  M)  difference  equation 
formed  with  the  data  samples,  we  try  to  explain  some  of  the  noise  in  the 
data  by  a  few  additional  exponentials,  which  are  subsequently  eliminated. 

(2)  The  signal  samples  s(n)  when  embedded  in  a  (Toeplitz  or  Hankel) 
matrix  result  in  a  rank  deficient  matrix.  This  is  a  special  property 

of  the  exponential  signals  and  is  independent  of  the  signal  parameters. 

We  make  use  of  this  knowledge  regarding  the  signal  to  improve  the  signal 
to  noise  ratio  (SNR)  in  the  data.  This  is  accomplished  in  a  natural  way 
using  singular  value  decomposition  (SVD)  of  a  matrix. 

The  improvement  in  performance  of  our  methods  over  traditional 
methods  is  demonstrated  by  computer  simulation. 
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ABSTRACT 


^We  address  Che  problem  of  estimating  the  parameters  of  a  signal  embedded 
in  noise.  The  signal  is  composed  of  samples  of  a  sum  of  M  exponentially  damped 

or  undamped  sinusoidal  signals .\  That  is,  s(n)  the  signal  samples,  are  given  by 

^ _ _ _ _ _ 1 

the  formula  s(n)  ■  Ea^  e8^n  ,  where  a.  are  the  amplitudes  of  the  exponentials. 
k*i 

The  problem  is  to  estimate  the  value  of  M,  {a^}  and  {s^}  from  a  short  record  of 
the  data  samples  which  are  corrupted  by  noise. 

v  We  suggest-  improvements  to  the  original  approach  proposed  by  Prony.  He 
observed  that  the  samples  s(n)  obey  an  M  th  order  difference  equation  and  that 
from  the  roots  of  the  characteristic  polynomial  of  the  difference  equation,  the 
parameters  {s,}can  be  determined.  We  present  three  methods  to  improve  this 

v._  _ K.y 

approach  when  the  signal  is  noise  corrupted*  The  two  key  ideas  used  in  the  ixn- 
provements  are  listed  below  in  the  order  of  importance. 

(1)  By  using  a  higher  order  (greater  than  M)  difference  equation  formed 
with  the  data  samples,  ve  try  to  explain  some  of  the  noise  in  the  data  by  a  few 
additional  exponentials,  which  are  subsequently  eliminated. 

(2)  The  signal  samples  s(n)  when  embedded  in  a  (Toeplitz  or  Hankel)  matrix 
result  in  a  rank  deffident  matrix.  This  is  a  special  property  of  the  exponen¬ 
tial  signals  and  is  Independent  of  the  signal  parameters.  We  make  use  of  this  • 
knowledge  regarding  the  signal  to  Improve  the  signal  to  noise  ratio  (SNR)  in 

Che  data.  This  is  accomplished  in  a  natural  way  using  singular  value  decompo¬ 
sition  (SVD)  of  a  matrix. 

The  Improvement  in  performance  of  our  methods  over  traditional  methods  is 
demonstrated  with  computer  simulation. 
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CHAPTER  1 :  INTRODUCTION 


1.1:  PROBLEM  STATEMENT: 

We  address  the  problem  of  estimating  certain  unknown  parameters  of  signals 
corrupted  by  noise.  Let  us  assume  that  a  sequence  of  numbers,  y(l) ,y(2) , . . .y(N) , 
which  are  samples  of  a  noise  corrupted  signal.  Is  available  for  processing  by  a 
digital  computer.  The  sequence  y(n) ,  n-l,2...N  is  further  assumed  to  be  com¬ 
posed  of  samples  of  a  sum  of  M  unknown,  exponentially  damped  or  undamped  sinu¬ 
soidal  signals  and  additive  noise  w(n) .  That  is. 


y(n)  -  I  a,  e8kn  +  w(n)  ,  n«l,2...N 
k-1  k 


1.1 


The  sampling  period  is  assumed  to  be  one  second  without  loss  of  generality.  The 
Nyquist  Interval  is  0-2ir,  a^  and  s^,  k*l,2,...M  are  unknown  complex  numbers ,  In 
general.  M  is  also  unknown.  We  write  the  complex  numbers  s^  explicitly  as 
follows: 


sk  “  °k  +  Jwk 

The  s  shall  be  called  the  damping  factors,  and  the  ' s  are  the  radian  fre¬ 
quencies  that  will  lie  between  0  and  2tt.  j  Is  the  square  root  of  minus  one. 
Given  the  N  samples  of  noise  corrupted  data,  the  problem  is  to  estimate 

ak,  sk,  k-l,2,...M. 


the  unknown  values  of  M, 


2 


1.2:  SIGNIFICANCE  OF  THE  PROBLEM: 

The  above  is  an  age  old  problem  and  has  occurred  sometimes  in  disguise  in 
several  areas  of  applied  science.  Proliferation  of  high  speed  computers  has  re¬ 
kindled  interest  in  this  problem  in  the  past  two  decades.  The  following  examples 
drawn  from  different  areas  are  representative  of  the  occurrence  of  this  problem. 

In  speech  signal  processing  literature,  it  has  been  observed  that  the 
human  vocal  tract  coupled  with  the  nasal  cavity  behaves  approximately  like  a 
pole-zero  filter  [1].  Wideband  excitation  to  this  filter  is  provided  by  the 
vibration  of  the  vocal  cord  in  the  form  of  quasi-periodic  train  of  short  time 
duration  pulses.  Therefore,  to  a  first  order  approximation  [2],  each  pitch 
frame  (the  time  duration  in  between  two  pitch  pulses)  of  the  speech  waveform 
for  (nasalized)  vowel  sounds  can  be  assumed  to  be  a  sum  of  exponentially  damped 
sinusoidal  signals.  The  sequence  of  noise  values,  w(n) ,  in  formula  1.1,  repre¬ 
sents  the  error  in  this  assumption  plus  any  observation  noise.  Thus,  given  a 
sequence  of  samples  of  the  speech  waveform  from  individual  pitch  frames,  one 
is  interested  in  obtaining  a  pole-zero  representation  for  the  vocal  tract,  fcr 
speech  data  compression  purposes. 

In  another  example  the  signals  could  be  undamped  sinusoidal  signals, 
closely  spaced  in  frequency  compared  to  the  reciprocal  of  the  observation  in¬ 
terval  of  N  seconds.  This  corresponds  to  assuming  ■  0,  k-l,2,...M  in  formula 
1.1.  This  is  the  so  called  spectral  resolution  problem  and  has  received  con¬ 
siderable  attention  in  signal  processing  and  statistical  time  series  analysis 
literature.  See  for  example,  the  review  article  by  Ray  and  Marple  [3].  Un¬ 
damped  sinusoidal  signals  occur  in  a  wide  variety  of  practical  problems,  ranging 
from  resolving  the  locations  of  distant  stars  by  a  radio  telescope  [4]  to  measur¬ 
ing  the  depth  of  the  ionospheric  layer  by  'ionospheric  sounding*  [5].  In  these 
problems  a  critical  factor  is  the  limitation  on  the  number  of  samples  available 


3 


for  processing.  For  example,  the  aperture  of  the  radio  telescope,  which  corres¬ 
ponds  to  N,  Is  necessarily  limited  by  cost  considerations;  whereas  In  the  second 
example,  the  ionospheric  layer  may  be  changing  in  depth  with  respect  to  time, 
thus  limiting  the  observation  interval  N. 

The  third  case  of  interest  is  when  the  w  's  in  formula  1.1  are  zero;  that 

k 

is,  the  signal  is  composed  of  a  sum  of  M  exponential  decays.  Such  signals  are 
often  observed  in  biomedical  problems  involving  radioactive  tracer  data  analy¬ 
sis  [6].  See  also  ref.  [7]. 

In  all  the  above  examples,  it  is  appropriate  to  consider  the  sequence  y(n) 
as  a  sum  of  M  deterministic  signals  and  white  noise  based  on  the  physics  of 
signal  generation  and  measurement.  However,  the  model  in  formula  1.1  can  also 
be  used  for  estimating  the  spectrum  of  an  autoregressive  (AR)  or  autoregressive 
moving  average  (ARMA)  process.  Such  processes  are  generated  when  an  all  pole 
(AR)  or  a  pole-zero  (ARMA)  filter  is  excited  by  a  white  noise  sequence.  Given 
a  short  record  of  the  random  process,  the  problem  is  to  estimate  the  spectrum 
of  the  process.  This  problem  can  be  put  in  our  framework  by  observing  that  the 
true  autocorrelation  sequence  of  AR/ARMA  processes  are  samples  of  a  sum  of  M 
exponentially  damped  signals,  M  being  the  number  of  poles  in  the  generating  fil¬ 
ter  [8,9].  Therefore,  if  we  estimate  the  autocorrelation  sequence  from  the 
given  data,  it  is  reasonable  to  model  the  estimated  autocorrelation  sequence 
as  in  formula  1.1.  Now,  w(n)  stands  for  the  difference  between  the  estimated 
and  the  true  autocorrelation  sequence  of  the  process.  Hence,  if  we  estimate 


the  unknown  parameters. 


s^,  and  M,  it  is  possible  to  obtain  a  spectral  esti¬ 


mate  of  the  process  by  estimating  the  generating  filter  transfer  function. 


1.3:  PREVIOUS  RESULTS: 

Since  our  problem  is  an  old  one  (see  for  example  ref.  [10])  an  exhaustive 
review  of  the  extensive  literature  is  almost  impossible.  However,  there  are  at 


least  three  major  approaches  to  estimating  the  unknown  parameters  from  the  N 
given  samples  of  data  y(l) ,y(2) , . . .y(N) .  We  shall  briefly  review  these  below. 
Some  other  possible  approaches,  which  are  not  discussed  here,  include  data  ex¬ 
trapolation  techniques  [11,12]  and  system  identification  techniques  [13]. 

1.3.1s  DIRECT  APPROACH: 

In  this  approach  one  attempts  to  fit  a  signal  model  to  the  data  sequence 
y(n)  in  the  least  square  sense.  That  is,  the  error  E 


N 


E  ■  I  |y(n)  -  h(n) 
n*l 


1.2 


M 


where  h(n)  ■  Ea^  eSkn,  is  minimized  by  choosing  a.  's  and  s,  's.  M  •>.  assumed 

k-1  *  k 


number  of  signal  components.  If  we  set  the  derivatives  of  E  with  respect  to  the 
unknown  parameters  to  zero  in  attempting  to  minimize  E,  we  realize  that  some  of 
the  necessary  conditions  are  in  the  form  of  nonlinear  equations.  Nevertheless, 
these  equations  can  be  solved  by  using  standard  nonlinear  minimization  techniques 
[14].  See  also  ref.  [15]. 

A  simplification  to  this  approach  was  suggested  by  several  authors  [2,7, 


16-18]  based  on  the  following  observation.  If  the  s^'s  are  known,  then  the 


minimization  of  E  is  a  linear  least  square  problem,  by  which  we  mean  that  the  un¬ 
known  a^'s  can  be  found  (to  minimize  E)  by  solving  a  set  of  necessary  conditions 
which  are  a  set  of  linear  equations.  Therefore,  one  can  choose  values  for  the 
3^'s,  based  on  some  prior  knowledge  or  based  on  some  preprocessing  and  then  find 
the  associated  a^'s  that  minimize  E.  Then  the  value  of  s^'s  can  be  altered  to 
find  new  values  for  a^'s  that  give  a  lower  minimum  for  E.  This  process  is  re¬ 
peated  until  a  local  minimum  for.E  is  found.  The  corresponding  values  of  a^'s 
and  s^’s  are  the  estimates  of  the  signal  parameters.  This  is  essentially  a 
search  procedure.  If  the  value  of  M  is  not  known,  then  the  above  procedure  can 

a  A 

be  repeated  at  different  assumed  values  for  M,  starting  from  M«l,  until  the  minimum 


value  of  E  does  not  change  appreciably. 

Ihls  approach  is  optimum  in  the  maximum  likelihood  (ML)  sense  if  the 
perturbations,  w(n) ,  are  independent  and  Gaussian.  But  this  method  has  cer¬ 
tain  disadvantages.  Much  computation  is  involved,  especially  if  the  true  value 
of  M  is  large  and  unknown.  Also,  one  could  terminate  the  search  in  a  local  mini¬ 
mum  instead  of  the  global  minimum. 

However  this  approach  is  not  unreasonable  in  the  special  case  when  the 
data  is  known  to  consist  of  undamped  sinewaves  and  noise  [18].  In  this  case, 
s^'s  (*j  w^'s)  are  purely  imaginary  numbers.  Therefore,  one  needs  to  search  for 

a 

the  minimum  of  E  only  in  the  interval  0-2ir  in  M  dimensions.  Also,  if  the  fre¬ 
quencies  of  the  sinewaves  are  spaced  greater  than  1/N  Hz  apart,  then  one  can 
obtain  frequency  estimates  (w^'s)  close  to  optimum  (in  the  ML  sense)  by  comput¬ 
ing  the  discrete  Fourier  transform  (DFT)  of  the  data  samples  [19].  This  can  be 
done  efficiently,  using  the  fast  Fourier  transform  algorithm  [20].  Other  rele¬ 
vant  references,  where  similar  direct  approaches  are  u  ed,  include  McDonough 
and  Huggins  [21],  Schweppe  [22],  Van  den  Boss  [23],  Tuttle  [102],  and  Aigrain 
and  Williams  [103]. 

1.3.1:  ITERATIVE  APPROACHES: 

Many  nonlinear  minimization  problems  can  be  solved  iteratively,  by  solving 
a  linear  minimization  problem  at  every  step  in  the  iteration.  Our  problem  is  no 
exception  to  this  rule.  We  shall  briefly  review  two  iterative  procedures  in  this 
section.  Both  attempt  to  minimize  the  error  E  in  formula  1.2  iteratively. 

The  first  method  that  we  shall  briefly  describe  is  that  of  Steiglitz  and 
McBride  [24]  and  is  an  out  growth  of  an  idea  proposed  by  Kalman  [25].  In  this 
method  the  error  E  in  formula  1.2  is  rewritten  as  follows: 

N-l  „ 

E  -  I  |y(n+l)  -  H(z){5(n) } |  , 
n-0 


1.3 


H(z)  (6(n)}  reads  H(z)  operating  on  6(n).  For  example  z  (y(n)}  -  y(n-l),  and 

if  H(z)  -  hQ  +  h1z  1  +  h2z~2,  then»  H(z)  (y(n)}  -  hQ  +  hj  y(n-l)^+  h2y(n-2). 

M 

Formula  1.3  follows  from  the  fact  that  the  signal  model  h(n)  «*  Z  a,  e8k  can  be 

k-1  K 

considered  as  an  impulse  response  [20]  of  a  discrete  time  pole-zero  filter  with 

transfer  function  H(z) .  Then  h(n)  can  be  written  in  operator  notation  [20]  as 

h(n)  ■  H(z)  {5(n)}  *  5(n),  where 

D(z) 

M-l  .  -i 

Z  n  z 

A.  .  N(z)  i-0  1 

H(z)  ■  *  ■  - 

D)z)  M 

1+  i«l  di  2 

A  A 

Therefore  one  wishes  to  minimize  E  in  1.3  by  choosing  N(z)/D(z)  or  equivalently 

by  finding  . nM_1  and  d^,d2 . d^.  Again,  this  problem  is  nonlinear  in 

parameters  {d^K 

Kalman's  suggestion  was  to  minimize  the  following  error. 


-  Min 

D(z),  N(z) 


I  |D(z) (y(n+l) }  -  N(z){d(n) } | 
n-0 


This  is  not  the  same  error  as  E  in  1.3.  But  the  advantage  of  this  modified 
error  criterion  is  that  it  can  be  minimized  with  respect  to  the  coefficients  of 
the  polynomial  N(z)  and  D(z)  by  solving  a  linear  least  squares  problem.  Stieglitz 
and  McBride  improved  on  this  idea  by  minimizing  the  following  error  iteratively. 

In  the  1  th  iteration,  they  minimized 


.  Min  . 
D1(z),  Ni(z) 


N-l  Di(z){y(n+1) }  N^CzHdCn)} 


°i-i  <z) 


Di-i  (z> 


with  respect  to  D,  (z)  and  N  (z) ,  with  D.  . (z)  known  from  the  previous  iteration. 


To  start  the  iteration  Dq(z)  is  calculated  by  some  simple  procedure.  The  work 
involved  in  each  iteration  is  the  same  as  in  Kalman's  method;  i.e.  only  linear 
least  squares  problems  need  be  solved  However  the  data  sequence  y(n)  is  pre¬ 
filtered  at  every  iteration  (i.e.  y(n)/D^_^(z)  is  used  instead  of  y(n)).  If 

A  A 

D^(z)  converges  to  D(z),  the  above  error  coincides  with  the  desired  error  in 
formula  1.3 

This  method  may  have  convergence  problems  [26] >  especially  when  the  data 
consists  of  slightly  damped  or  undamped  sinusoidal  signals  in  noise.  Also, 
there  seems  to  be  no  systematic  procedure  for  selecting  the  order  of  the  numer¬ 
ator  and  denominator  polynomials.  See  Done  and  Rushforth  [27],  Kay  [28],  and 
Stoica  and  Soderstorm  [29]  for  results  related  to  this  method. 

The  next  method  we  briefly  allude  to  is  that  of  Evans  and  Fischl  [30]. 

In  this  method  the  authors  make  use  of  the  following  observation  regarding  the 
error  in  formula  1.3.  For  a  given  polynomial  D(z),  E  can  be  minimized  by  solv¬ 
ing  a  linear  least  square  problem  by  finding  the  numerator  polynomial  coeffi- 

A  A 

cients.  Therefore  the  error  E  which  is  a  function  of  D(z)  and  N(z)  can  be 
rewritten  as  an  explicit  function  of  D(z)  only.  That  is,  E(N(z) ,D(z))»E^ 
(D(z)).  Then  an  Iterative  procedure  is  used  to  minimize  E^  (D(z))  with  respect 
to  D(z).  This  method  resembles  the  search  procedure  described  in  section  1.3.1 
where  the  minimization  was  carried  out  with  respect  to  the  parameters  (s^). 
Although  this  technique  was  intended  for  a  filter  design  problem,  it  can  be 
also  used  for  our  estimation  problem.  The  difficulties  associated  with  this 
method  include  order  selection  for  the  polynomials  and  computational  cost. 

1.3.3:  INDIRECT  APPROACH: 

This  approach  is  based  on  an  idea  proposed  by  Prony  [31]  in  1795.  His 
idea  was  to  transform  the  nonlinear  estimation  problem  into  a  linear  one.  It 
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is  based  on  the  following  observation.  For  example,  let  us  assume  that  the  data 

sequence  y(n)  consists  of  two  unknown  exponential  signals  only,  i.e.  y(n)*  Za^e8^. 

k-1 

We  want  to  determine  the  values  of  s^  s2,  ax,  and  a2-  Note  that  a  polynomial 
G(z)*l+g1z  +g2z  ,  with  g1“-(eSl+eS2)  and  g2  ■eSl+S2  'annihilates'  the  sequence 
y(n).  By  annihilates  [20]  we  mean  that  G(z){y(n)}  will  equal  zero  for  all  n. 

This  polynomial  is  termed  'prediction-error  polynomial'  here,  due  to  its  close 
connection  with  the  linear  prediction  analysis  [39].  Note  that  the  polynomial 
G(z)  has  roots  at  e  1  and  e  2.  Therefore  if  we  can  find  the  coefficients  of  the 


polynomial  G(z),  namely  g,  and  g  ,  given  the  sequence  y(n),  we  can  find  s  and 
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s2  by  finding  the  roots  (zeros)  of  the  polynomial.  The  coefficients  of  the  poly¬ 
nomial  can  be  found  by  solving  the  following  equations. 

y(3)  y(2)  y(l)l  fl  -  [o' 

8i  1-6 

y(4)  y(3)  y(2)  g2  0 

-  J  L  J  L  . 

The  right  hand  side  of  the  above  equations  is  set  to  zero  because  G(z)  annihi¬ 
lates  the  sequence  y(n).  Ideally,  only  4  (two  times  the  number  of  exponential 


signals  in  data)  contiguous  samples  of  y(n)  are  needed  to  find  gx  and  g2.  Once 
sx  and  s2  are  found,  the  values  of  ax  and  a2  can  be  found  by  solving  the  follow¬ 
ing  linear  equations. 


1.7 


Summarizing,  Prony's  contribution  was  to  transform  the  problem  of  finding  the 


nonlinearly  entering  parameters  s,  ,  k*l,2,...  M  into  a  problem  of  finding  the  co- 

M  -k 

efficients  g1,g2,....gM  of  an  M  th  degree  polynomial  G(z)-1  +  Z  g^  .  The 

k“l 

degree  of  the  polynomial  G(z)  should  equal  or  exceed  M,  the  number  of  signal 
components  in  the  data,  {g^}  are  found  by  solving  linear  equations.  The  {s^} 
are  found  from  the  zero  locations  of  G(z).  The  amplitudes  {a^}  are  subsequently 
found  by  solving  yet  another  system  of  linear  equations  as  in  1.7. 


Since  Prony's  time  this  idea  has  been  rediscovered  several  times  in  different 
fields.  This  technique  is  called  the  'covariance  method'  of  linear  prediction  in 
speech  signal  processing  literature  [39]. 

Since  Prony's  idea  was  intended  for  extracting  signal  parameters  of  a 
noiseless  signal  sequence  y(n)  (with  w(n)*0  in  formula  1.1) ,  any  observation  noise 
tends  to  perturb  the  estimates  of  the  parameters  {s^}  considerably  [32].  If  more 
than  2M  data  samples  are  available  for  processing,  the  effect  of  noise  can  be 
countered  to  some  extent  by  using  a  'least  squares'  approach,  as  can  be  expected. 
That  is,  the  additional  samples  can  be  used  to  write  more  equations  in  1.6,  and 
the  unknown  {g^}  can  be  found  by  satisfying  these  equations  in  the  least  square 
sense.  This  was  attempted  as  early  as  1924  in  ref.  [33].  See  also  Hildebrand 
[32]  and  Lanczos  [78].  There  have  been  numerous  other  modifications  to  Prony's 
method  in  which  attempts  have  been  made  to  counter  the  effects  of  noise.  Van 
Blaricum  [34,35]  has  compiled  a  long  list  of  references  relating  to  Prony's 
method.  Prony's  method  and  its  relation  to  representation  of  signals  by  a  sum 
of  exponentials  has  been  given  considerable  attention  in  refs.  [36-38]. 

Many  recently  popularized  parametric  spectrum  analysis  techniques  such  as 
linear  prediction  analysis  [39]  and  AR/ARMA  modelling  algorithms  [3]  bear  simi¬ 
larities  to  Prony's  method.  However,  these  algorithms  are  concerned,  in  general, 
with  obtaining  a  spectral  estimate  of  the  data  sequence,  instead  of  extracting 
specific  signal  parameters. 

The  Indirect  approach  to  estimating  the  signal  parameters  is  attractive  in 
comparison  to  other  approaches  due  to  at  least  three  reasons.  Firstly,  it  tends 
to  be  computationally  less  expensive.  It  Involves  solving  a  special  set  of 
linear  equations  for  which  computationally  efficient  algorithms  are  available 
[40,41],  and  rooting  a  polynomial  G(z) .  In  contrast,  the  direct  approach  de¬ 
tailed  in  section  1.3.1  Involves  a  multidimensional  search  and  the  iterative 


approach  is  not  guaranteed  to  converge.  Secondly,  even  if  the  signal  underlying 
the  data  is  not  exactly  a  sum  of  exponentials  as  assumed,  it  is  possible  to  draw 
meaningful  inferences  about  the  signal  from  the  polynomial  G(z)  by  evaluating  it 
on  the  unit  circle  |z|»l.  Thirdly,  the  resolution  of  closely  spaced  s^'s  (which 
could  represent  the  'poles  of  a  linear  system'  which  generated  the  signal  or  fre¬ 
quencies  of  sinewaves)  is  limited  primarily  only  by  the  SNR  in  the  data.  In  other 
words,  if  the  SNR  is  high  enough,  then  arbitrarily  closely  spaced  s^'s  can  be  re¬ 
solved  or  accurately  determined.  This  is  in  stark  contrast  with  Che  discrete 
Fourier  transform  based  methods  [42,19]  in  which  the  frequencies  closer  than  1/N 
cycles  apart  cannot  be  resolved  even  if  the  SNR  is  infinite.  The  situation  is 
even  worse  if  the  signals  are  exponentially  damped. 

1.4:  MAIN  RESULTS: 

Here  we  pursue  the  indirect  approach  originating  from  the  work  of  Prony. 

That  is,  we  compute  the  coefficients  of  a  prediction-error  filter  polynomial 
G(z)  from  the  N  given  samples  of  data  and  obtain  the  estimates  of  {s^}  from  M 
of  its  roots.  The  {a^}  are  subsequently  estimated  as  in  Prony 's  method.  The 
parameter  estimates  obtained  from  Prony 's  method  and  its  modifications  are  un¬ 
satisfactory  in  accuracy  at  low  SNR.  The  term  'low  SNR'  is  used  here  in  a  rela¬ 
tive  sense. 

In  this  dissertation,  we  show  how  to  obtain  accurate  estimates  of  (s^) 

(and  subsequently  {a^})  ^rom  the  zeros  the  polynomial  G(z)  at  low  SNR  values. 

Ue  also  give  procedures  for  finding  an  estimate  of  M. 

The  improvement  in  parameter  estimation  accuracy  achieved  by  our  methods 
is  primarily  due  to  two  reasons.  They  are  briefly  outlined  below  in  the  order 
of  importance. 

(1)  We  trade  redundancy  in  the  degree  of  G(z),  L,  for  increased  tolerance 
of  parameter  estimates  to  noise  in  the  data.  Ideally,  if  the  data  is  noiseless. 
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an  M  th  degree  polynomial  G(z)  can  be  used  to  find  { a^}  from  its  M  zeros  as  In 
Prony's  method.  But  if  the  data  is  noisy*  {s^}  cannot  be  estimated  accurately 
using  an  M  th  degree  polynomial  G(z) .  A  value  of  L  greater  than  M  is  necessary. 
This  amounts  to  attempting  to  'explain'  the  M  signal  components  and  the  noise 
with  L  exponentials.  This  redundancy  in  the  degrees  of  freedom  tends  to  in¬ 
crease  the  accuracy  in  the  parameter  estimates.  Several  authors  have  alluded  to 
this  fact  [43-49,18].  But  this  alone  is  not  sufficient.  To  extract  accurate 
signal  parameter  estimates,  one  has  to  distinguish  which  M  of  the  L  zeros  of 
G(z)  are  related  to  the  signal  parameters  (these  M  zeros  are  called  signal  zeros) 
and  which  L-M  zeros  (called  extraneous  zeros)  are  related  to  noise  in  the  data. 

In  chapter  3  we  present  a  method  for  this  purpose.  We  pick  the  M  (signal) 
zeros  out  of  the  L  zeros  of  G(z)  using  a  simple  least  squares  criterion.  Using 
this  procedure,  Prony's  method  can  be  used  at  low  SNR  as  well.  This  modification 
also  provides  a  way  of  estimating  the  value  of  M. 

(2)  Further  improvement  in  estimation  accuracy  is  achieved  by  adaptive 
prefiltering  using  singular  value  decomposition  (SVD)  or  eigenvalue-eigenvector 
decomposition.  This  part  of  the  work  is  a  summary  of  our  published  work  [52-56]. 
This  improvement  stems  from  the  following  observation.  The  exact  values  of  the 
exponent  parameters  {s^}  can  be  obtained  from  the  zeros  G(z)  if  the  data  is 
noiseless,  using  Prony's  method.  Therefore  it  is  reasonable  to  expect  that  if 
the  SNR  in  the  data  can  be  improved  by  'filtering',  then  more  accurate  parameter 
estimates  can  be  obtained  from  the  zeros  of  G(z) .  But  to  increase  the  SNR  in 
the  data  some  knowledge  of  the  signal  is  needed.  Although  we  do  not  know  the 
value  of  the  signal  parameters,  the  fact  that  they  are  a  sum  of  exponentials 
can  be  used  for  this  purpose.  Let  the  data  saatples  y(n) ,n-l ,2, . . .N  be  arranged 
in  the  form  of  a  Toeplitz  matrix  A. 


y (k+1 )  y(k)  .  .  .  y(2) 


A  “ 


y(N-l)  y(N-2)  .  .  .  y(N-k) 


1.8 


If  Che  daca  is  noiseless ,  Che  rank  of  A  will  be  M,  provided  chac  k  sacisfies  cer- 
Cain  constraints  (see  chapter  2) .  This  property  is  peculiar  to  Che  exponential 
signals  we  are  dealing  with.  This  knowledge  regarding  the  signal  can  be  used 
to  improve  the  SNR  in  the  data.  SVD  provides  a  way  of  achieving  this.  Details 
are  given  in  chapter  4. 


1.5:  ORGANIZATION  OF  THE  DISSERTATION: 

Chapter  2  is  concerned  with  properties  of  the  zeros  of  the  prediction-error 
filter  polynomial  G(z)  when  its  coefficients  are  computed  from  a  noiseless  (signal 
only)  data  sequence  y(n).  This  chapter  does  not  address  the  main  problem  of  esti¬ 
mating  the  signal  parameters  {s^} ,  {a^}  and  M.  But  the  results  obtained  regarding 
the  zeros  of  G(z)  help  provide  Insight  into  this  problem. 

Chapters  3  and  4  are  concerned  with  algorithms  for  estimating  the  desired 
signal  parameters  from  a  noisy  data  sequence  y(n).  In  chapter  3  we  present  a 
modification  to  Prony's  method.  This  modification  makes  it  possible  to  estimate 
the  number  of  signal  components  M  in  the  data  and  to  accurately  estimate  the 
parameters  at  lower  SNR  values  than  previously  possible  with  Prony's  method. 

In  chapter  4  we  present  two  methods  based  on  singular  value  decomposition 
(SVD)  to  obtain  even  more  accurate  estimates.  These  methods  are  closely  con¬ 
nected  to  several  other  techniques  based  on  eigenvalue/eigenvector  decomposi¬ 
tion  of  an  estimated  autocorrelation  matrix  of  data  samples.  An  important 
contribution  in  chapter  4  is  to  describe  these  connections  explicitly. 


In  chapter  5,  computer  simulation  results  are  presented.  The  estimation 
accuracy,  measured  in  terms  of  bias  and  variance  of  the  parameter  estimates  of 
different  methods,  is  calculated  based  on  several  independent  trials  and  com¬ 
pared  to  the  appropriate  Cramer-Rao  bounds. 

In  chapter  6,  major  conclusions  are  listed. 


CHAPTER  2:  ZEROS  OF  THE  PREDICTION-ERROR  POLYNOMIAL 


Throughout  this  dissertation  we  have  used  the  zero  locations  of  a  poly¬ 
nomial  called  the  prediction-error  filter  polynomial  G(z)  to  estimate  the  un¬ 
known  exponent  parameters  s^,k"l,2, . . .M.  See  section  1.1  for  definition  of 
the  problem.  The  methods  we  have  suggested  differ  primarily  only  in  the  manner 
in  which  the  coefficients  of  G(z)  are  computed  from  the  data  samples  y(l),y(2), 
...y(N).  Ideally,  an  M  th  degree  polynomial  G(z)  should  be  sufficient  to  esti¬ 
mate  the  unknown  parameters  s^.k-1,2, . . .M.  However,  when  the  data  is  noisy,  as 
mentioned  before,  some  redundancy  in  the  degree,  L>M,  of  G(z) ,  improves  the 
accuracy  of  the  estimates  of  s^  [43-49,18].  In  this  chapter  we  shall  study  the 
properties  of  the  zeros  of  G(z),  as  a  function  of  different  values  of  L,  for 
fixed  value  of  N,  the  number  of  data  samples,  and  M,  the  number  of  signal  compon¬ 
ents  in  the  data.  For  analytical  tractability  we  assume  that  the  sequence  y(n) 
is  noiseless,  l.e.  that  it  is  a  sum  of  samples  of  M  unknown  exponential  signals 
only.  Although  this  assumption  is  unrealistic,  the  insights  we  gain  by  studying 
the  properties  of  zeros  of  G(z)  are  helpful  in  the  more  practical  case  of  noisy 
data,  discussed  in  subsequent  chapters. 

There  are  two  main  points  made  in  this  chapter.  Firstly,  it  is  pointed 
out  that,  for  a  given  value  of  N,  there  are  upper  and  lower  limits  on  the  value 
of  L,  the  degree  of  the  polynomial  G(z) .  What  do  we  mean  by  these  limits?  If 
and  only  if  L  lies  in  between  these  limits  can  we  hope  to  find  the  parameters 
s^,k«l,2, . . .M  from  the  zeros  of  G(z).  This  fact  is  important  to  keep  in  mind 
because  in  chapters  3  and  4,  we  try  to  improve  the  accuracy  of  the  estimates  of 
the  parameters  when  the  data  is  noisy,  partly  by  using  L  values  larger  than  M, 
and  the  limits  on  L  derived  in  this  chapter  serve  as  bounds  in  that  case.  The 


limits  on  L  values  are  derived  in  section  2.1.  Secondly,  if  L  is  chosen  greater 
than  M,  the  polynomial  G(z)  has  L-M  extra  zeros  (other  than  the  M  signal  zeros 
which  fall  at  eSk,  k-l,2,...M).  These  L-M  extraneous  zeros  turn  out  to  be  a 
necessary  evil  in  our  estimation  algorithms  presented  in  chapter  4.  It  is  im¬ 
portant  to  be  able  to  Identify  the  M  signal  zeros  of  G(z)  from  the  L-M  extraneous 
ones,  so  that  the  (estimates  of)  parameters  s^,k«l,2, . . .M  can  be  found  from  the 
locations  of  the  signal  zeros.  This  is  facilitated  by  imposing  certain  constraints 
on  the  coefficients  of  the  polynomial  G(z)  as  explained  in  sections  2.2  and  2.3. 

The  effect  of  such  constraints  on  G(z)  is  to  cause  the  L-M  extraneous  zeros  of 
G(z)  to  always  fall  inside  the  unit  circle,  |z|*l,  in  the  z-plane.  That  is, 
these  L-M  zeros  will  have  magnitude  less  than  unity.  This  property  is  used  in 
distinguishing  the  signal  zeros  from  the  extraneous  ones.  The  constraints  occur 
in  a  natural  way  when  our  singular  value  decomposition  (SVD)  based  algorithms  are 
used  (See  Chapter  4) . 

2.1:  THE  LOCATIONS  OF  THE  SIGNAL  ZEROS  OF  G(z) : 

In  this  section  we  shall  show  that  the  degree  L  of  G(z)  has  maximum  and 
minimum  limits.  The  noiseless  signal  sequence  y(n)  is  given  by  the  formula 

M 

y(n)  -  I  a.eSkn  n-l,2...N  2.1 

k-1  K 

The  exponent  parameters  s^  are  assumed  distinct.  Let  £*  be  a  (L-fl)x(l)  vector 
of  coefficients  of  a  polynomial  G(z). 

(g„»g1.g2---gL)T  2.2 

G(z)  ■  I  g,  z  ^  2.3 

k-0  K 

'T'  denotes  matrix  transpose.  Let  us  arrange  the  data  samples  y(n)  in  the  form 
of  a  (N-L)X(L+1)  Toeplitz  matrix  A’ . 
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y(L+l) 

y(L+2) 

y(L)  . 

y(L+D . 

..  y(l) 

..  y(2) 

y(N) 

y(N-l) . 

..  y(N-L) 

Proposition  1: 

If  the  coefficient  vector  £  satisfies  A£  £'■(),  and  if  L  satisfies  the  in- 

Sir 

equality  M<J.<.(N-M) ,  then  G(z)  has  M  of  its  L  zeros  at  e  ,k*l,2,...M. 

This  property  is  easy  to  see  from  the  following  observation.  Any  row  of 
A^  can  be  written  as  a  linear  combination  of  the  following  M  linearly  indepen¬ 
dent  vectors. 

— 9i/  — 2sir  — Lsv 

^  -  (1,  e  ,  e  , .  e  ),  k-l,2,...M  2.5 

That  is,  the  rank  of  A£  is  M,  as  long  as  A£  has  at  least  M  rows,  i.e.,  if  (N-L)>M 
or  L<_(N-M) .  The  null  space  of  A£  has  dimension  L+l-M.  Since  £'  lies  in  the  null 

space  of  A^,  f^g'-O,  k-l,2,...M,  if  L<.(N-M) .  Therefore, 

— 9u  -2si,  -Lsu 

go+Sie  +82®  + . +SL®  “°»  k-l,2,...M  2.6 

In  other  words,  the  polynomial  G(z)«  E  g,z  has  zeros  at  e  ^  ^  1»2,...M. 

k-0  K 

However  L  has  to  be  equal  to  or  greater  than  M,  in  order  that  the  null  space  of 
Af  has  at  least  dimension  one.  If  L*M,  the  null  space  of  A£  has  a  unique  vector 
which  is  proportional  to  that  given  by  Prony's  method  [32].  If  L>.(N-M)  A£  will 
have  less  than  M  rows  (the  rank  of  A^<M)  and  the  property  in  equation  2.6  will 
not  be  true  in  general.  Hence  L  should  satisfy  the  inequality  M<L<_(N-M)  in  order 

*k 

that  G(z)  may  have  zeros  at  e  ,k-l,2,...M. 

Now  let  us  define  a  (N-L)x(L+l)  Hankel  matrix,  called  a  backward  data  matrix 
A£,  in  which  the  N  data  samples  y(n)  are  complex  conjugated  and  arranged  in  re¬ 
versed  order  compared  to  the  forward  data  matrix  defined  in  formula  2.4  de¬ 
notes  complex  conjugate. 


i 


y*(l)  y*(2) 

y*(2)  y*(3) 


y*(L+l) 

y*(L+2) 


y*  (N-L)  y*  (N-L+l ) . y*(N) 


Proposition  2: 


If  a  coefficient  vector  £'  satisfies  the  homogeneous  equation  A^g'»0,  and 

_g* 

if  L  satisfies  the  inequality  M<L<(N-M) ,  the  polynomial  G(z)  has  zeros  at  e  k, 
k-l,2,...M. 

Following  proposition  1,  any  row  of  A£  can  be  written  as  a  linear  combina 
tlon  of  the  following  M  linearly  independent  vectors. 


s?  2s? 
-  (1,  e  ,  e  , 


e  ),  k«l,2,...M 


if  L<(N-M).  Thus,  as  before,  b  £'«0,  k*l,2,...M.  Therefore, 

sk  2s?  Lsv 

gg+gj^e  +g2e  + . +g^e  **0»  k“l»2,...M  2.9 

^  -k  _Sk 

which  means  the  polynomial  G(z)«  E  g,z  has  zeros  at  e  ,  k-l,2,...M.  The 

k-0  k 

rest  of  the  argument  is  the  same  as  in  proposition  1.  It  is  noted  that  these 
M  signal  zeros  are  the  complex  conjugates  of  the  reciprocals  of  the  correspond¬ 
ing  M  signal  zeros  in  proposition  1.  In  other  words,  the  M  signal  zeros  of  pro 
position  I  and  2  are  the  reflections  of  one  another  about  the  unit  circle. 

If  the  signal  sequence  y(n)  is  composed  of  undamped  sinusoidal  signals 

(i.e.,  all  s,  '  s  are  purely  imaginary  numbers)  then  the  polynomials  in  proposi- 
*  + 

sk  ”8k 

tion  I  and  2  will  have  the  same  signal  zeros,  since  e  and  e  .  In  this  case 
the  rows  of  A£  and  A£  are  spanned  by  the  same  vectors,  since  f^  -  b^,  k-1,2,... 
Thus  the  two  data  matrices  can  be  combined  to  form  the  (2 (N-L)  x  (L+l))  forward 
backward  data  matrix,  Al. 


Proposition  3: 


If  the  vector  £'  satisfies  A^g'^O  and  if  L  satisfies  the  inequality 

M<Ls(N-  M/2  ),  then  the  polynomial  G(z)  has  M  of  its  zeros  on  the  unit  circle 
*  • 
sk  "sk 

at  e  *e  ,  k*l,2,...M.  The  data  sequence  y(n)  is  assumed  to  be  composed  of 
undamped  sinusoids.  M/2  stands  for  an  integer  rounded  to  the  next  larger 
integer  value  if  M/2  is  not  an  integer. 

The  arguments  in  this  case  are  identical  to  those  in  proposition  1  or  2. 

The  number  of  rows  in  is  2(N-L).  should  have  at  least  M  rows  or  2(N-L)>M 

if  G(z)  is  to  have  the  property  given  in  equation  2.6  or  2.9.  Hence,  L  should 
satisfy  MiLi(N-  M/2  ).  Henderson's  [57]  result  is  closely  related  to  our  propo¬ 
sition  1.  However,  our  results  in  this  chapter  differ  from  his  in  that  we  assume 
only  a  single  data  record  of  N  samples  is  available  for  processing.  The  idea  of 
using  the  data  in  both  forward  and  backward  directions  has  previously  been  used 
by  Burg  [58],  Ulrych  and  Clayton  [45],  Nuttall  [59],  and  others. 

2.2:  MINIMUM  PHASE  PROPERTY  OF  THE  EXTRANEOUS  ZEROS: 

In  the  previous  section  we  established  that  the  polynomial  G(z)  has  zeros 
at  desired  locations  in  the  Z  plane  from  which  the  signal  parameters  {s^}  can  be 
found  if  the  degree  of  the  polynomial  satisfies  certain  Inequalities.  The  poly-  ' 
nomlal  G(z)  also  has  L-M  other  zeros  called  extraneous  zeros  which  turn  out  to 
be  necessary  in  our  estimation  procedure.  They  are  necessary  because  a  polynomial 
G(z)  with  L>M  gives  more  accurate  estimates  of  the  parameters  {s^}  if  the  data  is 
noisy.  But,  they  pose  a  problem  since  we  have  to  identify  the  M  signal  zeros 
from  the  L-M  extraneous  zeros  before  estimating  the  parameters  {s^}.  in  this 
section  we  address  the  problem  of  finding  a  coefficient  vector  £'  for  the  poly¬ 
nomial  G(z)  which  would  simplify  this  identification  problem. 

The  location  of  the  L-M  extraneous  zeros  of  G(z)  depends  on  the  choice  of 
the  vector  which  lies  in  the  null  space  of  A£  (or  A£  or  A^  as  the  case  may  be) . 
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Recall  thaC  Che  dimension  of  Che  null  space  of  (or 

Chere  are  ac  lease  L+l-M  differenC  choices  for  £' .  Since  £'*0.,  £  will  also 
lie  in  Che  null  space  of  A^+A^  where  '+'  sCands  for  macrix  complex  conjugace 
Cranspose.  Therefore,  £'  can  be  chosen  as  an  eigenvecCor  corresponding  Co  one 
of  Che  L+l-M  zero  eigenvalues  of  A^+A^.  This  approach  is  relaCed  Co  Che  Pisarenko 
mechod  [60]  and  we  shall  say  more  abouC  Chis  in  chapeer  4.  The  problem  wich 
Chese  choices  of  _g_'  is  chac  Che  L-M  excraneous  zeros  of  G(z)  can  fall  anywhere 
in  Che  Z  plane  and  may  noc  be  idencifiable  from  Che  M  signal  zeros  in  Che  ab¬ 
sence  of  any  prior  informacion.  The  face  ehac  Che  eigenvecCor  has  unic  lengch, 

(  Ill'll  *1)  does  noC  in  any  way  consCrain  Che  loeacion  of  Che  L-M  excraneous 
zeros. 

In  Che  nexC  proposicion  we  find  a  unique  veccor  g1  (sacisfying  A£g/*0  or 
Af.S.,=.0  or  Afb£.,:“2.  as  c^e  case  may  be)  which  resulCs  in  a  polynomial  G(z)  wich 
cercain  desirable  propercies.  In  proposicion  5  we  show  how  Co  calculace  such  a 
veccor  £'  from  Che  given  daCa  samples. 

Proposicion  4: 

L  -k 

The  L-M  excraneous  zeros  of  Che  L  ch  degree  polynomial  G(z)*  I  g^z 

k*0 

fall  inside  Che  unic  circle,  |z|*l,  regardless  of  Che  M  signal  zero  loeacions 

T 

if  ics  coefficienc  veccor  £.'m(g0 ,  gx»  g? . g^)  which  saCisfies  A^g/HD 

(or  A^g'-O  or  A^g*»0)  is  chosen  co  have  Che  following  conscrainCs:  gQ  *  1 
2  2  2  2 

and  IgjJ  +|g2|  +|g3|  + . +  |g^|  *-s  It  is  assumed  ChaC  L  saCis¬ 

fies  Che  inequallcies  specified  in  proposicion  1  (or  2  or  3  as  Che  case  may  be) . 

As  long  as  £*  saCisfies  Che  homogeneous  equaCion  ic  follows  from  proposi- 

* 

sk  “sk 

Cion  1  (or  2  or  3)  chac  G(z)  has  M  signal  zeros  aC  e  ,  k*l,2,...M  (or  e  , 
k-l,2,...M).  We  can  facCor  Che  polynomial  G(z)  inco 


A£  or  A^)  is  L+l-M.  Hence 


G(z)  -  G,  (z)  G,  (z) 


2.11 


■J  ■  .  ■  . 


. 1  ■  * 
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Gx(z)  -  1  +  b2z  1+  b2z  2+ 


+  bMZ 


-M 


G2(z)  -  1  +  CjZ  l+  c2z"2+ 


-(L-M) 


—  +  ‘t-m2 

The  factor  Gx(z)  is  supposed  to  have  the  M  signal  zeros  which  fall  at  known  lo¬ 
cations  as  per  propositions  1  or  2  or  3.  We  are  however,  interested  in  the  lo¬ 
cations  of  the  extraneous  zeros  which  are  the  zeros  of  the  polynomial  G2(z), 
when  the  coefficients  of  G(z)  are  chosen  such  that  Q  is  minimum. 

Q  -  1  +  |g|2  +  |g|2  + . +  IgJ2  2.12 

Note  that  the  first  coefficient  is  chosen  to  be  unity. 

Since  the  polynomial  G(z)  is  the  product  of  Gx(z)  and  G2(z),  its  coeffi¬ 
cients  are  the  convolution  of  the  coefficients  of  G1(z)  and  G2(z).  That  is, 

00 

g  -  Z  c  .b.  2.13 

n  ,  n-k  k 
k»-“> 

where  gQ  *  1,  b0  *  1,  cQ  *  1,  b^  ■  0  for  i>M  and  i<0,  and  c^O  for  i>L-M  and 
i<0.  One  can  now  imagine  the  above  convolution,  as  performing  linear  predic¬ 
tion  (This  method  of  linear  prediction  is  called  the  'autocorrelation'  method. 


See  for  example,  ref.  [39].)  on  the  'data  sequence'  1,  bx ,  b2,  .  bM(which 


are  the  coefficients  of  Gx(z))  using  a  'prediction-error  filter'  defined  by 


the  vector  of  coefficients  of  the  polynomial  G2(z),  £,  £«(1,  cx ,  c2 .  ^L-M^ 

Then  the  coefficients  of  the  polynomial  G(z),  1,  gx,  g2,  .  g^^  form  the  'error 

sequence'  out  of  the  filter  £.  Thus  the  problem  of  minimizing  Q  in  formula  2.12 
is  the  same  as  one  of  finding  a  'prediction-error  filter'  £  that  minimizes  the 

error  energy  at  its  output,  operating  on  a  'data  sequence'  1,  bx,  b2,  .  b^. 

It  is  well  known  that  this  problem  results  in  a  set  of  Toeplitz  system  of  normal 
equations  (Yule-Walker  type,  see  ref.  [39]  for  example).  These  equations  are 


as  below. 
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R(0)  R(l)  ... 

• 

.  R(L-M-l) 

ci 

m  « 

R(-l) 

R(-l)  R(0)  ... 

C2 

R(-2) 

• 

R(M+I-L) . 

. R(0) 

_CL-M  _ 

R(M-L) 

The  elements  of  the  Toep.litz  matrix  and  the  vector  on  the  right  side  are  given 
by  the  formula 

M 

R(i)  -  Z  b  b  2.15 

k«0 

where  b^  are  the  coefficients  of  G  (z)  (b 0—l) «  The  R(i)  will  be  zero  beyond  lag 
M  since  the  'data  sequence'  itself  has  only  Mfl  samples  (b^'s).  The  solution 
to  the  above  linear  set  of  equations  minimizes  Q  and  determines  the  polynomial 
G2(z).  It  is  well  known  [39]  that  such  a  polynomial  has  all  its  zeros  (the  L-M 
extraneous  zeros  of  G(z))  inside  the  unit  circle.  A  proof  of  this  fact  is  given 
in  Appendix  A. 

The  above  discussion  is  valid  for  any  two  polynomials  G^z)  (with  known 
zeros)  and  G2(z)  (the  zeros  of  which  are  chosen).  Hence  the  following  statement: 

If  an  L  th  degree  polynomial  with  leading  coefficient  unity  has  M  known  zeros 
and  the  remaining  L-M  zeros  are  chosen  to  minimize  the  sum  of  magnitude  squares 
of  its  L+i  coefficients,  then  the  L-M  chosen  zeros  will  have  magnitude  less  than 
unity.  This,  of  course,  is  a  restatement  of  the  autocorrelation  method  of  linear 
prediction. 

Proposition  5: 

The  coefficient  vector  £'  of  the  polynomial  G(z),  satisfying  the  constraints 
mentioned  in  proposition  4,  can  be  found  by  simply  solving  a  linear  system  of  equa¬ 
tions. 

Since  £*  satisfies  the  homogeneous  equation  A^£,b0  (or  A^'-O  or  A^'-O) 
and  g0Bi,  we  can  rewrite  the  homogeneous  equation  as  follows. 


where  £  -  (gx,  g2,  .  %)  •  &'  “  <x»  4  >  »  and  A^  is  partitioned  as  Af  - 

(h  A  ) .  It  is  easy  to  see  that  the  columns  of  Af  and  the  column  vector  h  lie 
~f  f  sk  2sk 

in  the  span  of  the  M  linearly  independent  vectors,  (1,  e  ,e  . 

e(L“l)sk)Tf  k«l,2,...M.  The  rank  of  Af  is  also  M.  Although  there  are  many 

vectors  which  satisfy  equation  2.16,  the  one  which  has  minimum  Euclidean  length 

(or  minimum  value  for  |gx|^+  |g2|^+  .  +  Is^l^  can  f°und  as  follows: 

a  -  -A#  h  2.17 

S.  Af  “f 

where  A^  is  the  pseudoinverse  of  A^  [61].  This  vector  £  also  minimizes  ||£?|| 
with  the  constraint  that  g0»l. 

Now  consider  the  situation  when  we  find  two  coefficient  vectors  £^  and  £^ 
using  the  data  matrices  A£  and  A£,  respectively,  on  the  same  signal  sequence  de¬ 
fined  in  formula  2.1.  Let  £^  and  satisfy  A£  £^-0  and  A^  £^-0,  respectively, 
and  have  the  constraints  specified  in  proposition  4.  Let  the  corresponding 
polynomials  be  called  G^( z)  and  Gb(z). 


Proposition  6: 

The  L  th  degree  ((N-M)iLsM)  polynomials  Gf(z)  and  Gb(z)  have  the  same 
L-M  extraneous  zeros.  This  result  can  be  used  in  identifying  the  M  signal  zeros 
from  the  L-M  extraneous  zeros  in  the  presence  of  noise  in  the  data. 

From  propositions  1  and  2  it  follows  that  the  M  signal  zeros  of  Gf(z)  are 

the  reflections  of  the  M  corresponding  signal  zeros  of  G  (z)  about  the  unit 

sv  f  -sk 

circle.  If  e  ,  k«l,2,,..M  are  the  signal  zeros  of  G  (z) ,  then  e  ,  k-l,2,...l 

are  the  signal  zeros  of  Gb(z).  Let  the  polynomial  factors  corresponding  to  the 

signal  zeros  of  Gf(z)  and  Gb(z)  be  named  G*(z)  and  Gb(z)  respectively.  Due  to 

this  special  property  of  the  signal  zeros,  the  coefficients  of  G^(z)  are  the 

reversed,  complex  conjugated,  scaled  version  of  the  coefficients  of  G^(z).  That 

is,  if  the  coefficients  of  g5(z)  are  1,  bx,  b2,  ••••  b«  then  the  coefficients  of 
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b  *  * 

of  G^(z)  are  (bM>  b^. 


*  * 

b i , 1 ) / .  Now,  recall  that  the  location  of  the 


L-M  extraneous  zeros  are  determined  by  the  solution  of  the  equations  in  2.14. 

The  solution  vector  £  depends  on  the  values  of  R(i)'s.  However,  looking  at  the 
way  in  which  the  R(i)'s  are  computed  in  formula  2.15,  it  is  clear  that  they  are 
unaffected  by  the  reversal  and  complex  conjugation  of  b^'s.  And  the  scale  fac¬ 
tor  l/\,  occurring  in  the  coefficients  of  G^(z)  affects  both  sides  of  2.14 
equally,  leaving  the  vector  £  unchanged.  Therefore  we  conclude  that  the  L-M 
extraneous  zeros  of  G^(z)  and  G^(z)  remain  the  same  while  the  signal  zeros  are 
reflected  in  radius  about  the  unit  circle  as  the  prediction  direction  is  changed 
(i.e.,  the  homogeneous  equations  A^' ^*0  and  A^g^O.  are  used)  on  a  given  data 


sequence. 


-s& 


...  ,  .  .  ,  .  .sk  and  e  **  are  the  same. 

When  s,  are  purely  imaginary  (sinusoid  case) ,  e 

f  b  f  b 

Hence  the  polynomials  G*(z)  and  G°(z)  (the  factors  of  G  (z)  and  G  (z)  respec¬ 
tively)  are  the  same.  Thus  the  polynomials  G^(z),  G^(z)  (and  also  G^(z)) 
have  the  same  signal  and  extraneous  zeros  and  are,  in  fact,  identical. 


2.3:  SPREAD  OF  THE  EXTRANEOUS  ZEROS: 

In  the  previous  section  we  showed  that  the  L-M  extraneous  zeros  of  G(z) 
fall  inside  the  unit  circle  if  the  polynomial  coefficient  vector  £'  obeys  cer¬ 
tain  constraints.  We  shall  now  give  a  qualitative  argument  to  show  that  these 
extraneous  zeros  are  approximately  uniformly  distributed  around  the  inside  of 
the  unit  circle.  We  also  give  some  examples  to  Illustrate  this  point. 

In  equation  2.14  the  R(i)'s  are  zero  beyond  lag  M.  Therefore,  one  can 
imagine  the  R(i)  sequence  to  be  a  legitimate  autocorrelation  sequence  of  a 
moving  average  (MA)  process.  This  hypothetical  MA  process  would  be  the  output 
of  an  all  zero  filter  excited  by  white  noise,  the  zeros  of  which  coincide  with 
the  M  signal  zeros.  Further,  we  can  imagine  that  the  'prediction-error  filter' 


c,  is  attempting  to  whiten  this  process,  invoking  the  well-known  whitening 
property  of  the  autocorrelation  method  of  linear  prediction  [39].  In  fact,  an 
infinite  order  filter  £  will  be  required  to  perfectly  whiten  an  MA  process. 

Since  £  attempts  to  whiten  the  spectrum  (which  is  the  Z  transform  of  the  se¬ 
quence  R(i)  evaluated  on  the  unit  circle)  it  has  to  have  zeros  (the  extraneous 
zeros  of  G(z))  around  the  unit  circle  wherever  there  is  concentration  of  power, 
i.e.,  wherever  the  Z  transform  of  the  sequence  R(i)  does  not  have  a  (signal) 
zero  close  to  the  unit  circle. 

We  now  give  an  example  in  which  the  signal  sequence  is  given  by  the 

formula  y(n)  -  eSin  +  eS2n,  n-1 ,2, ....... .  where  sx  -  (0. 1+j 211  (0.35)) , 

s2  *  ( 0 . 0-t-j 211  (0.5))  and  s3  ■  (-0. 3+j 211(0. 65) ) .  The  signal  zeros  of  G^(z)  (which 

s  s  s  f 

can  be  found  by  solving  A£g^«0)  are  located  at  e  x,  e  2,  e  3.  The  factor  G^(z) 

of  G^(z)  is  II  (l-eSkz  *)•  The  plot  of  the  corresponding  'MA  spectrum'  ( J G? (z) | 

jw  ^“1  f  f 

z«eJ  )  is  shown  in  figure  2.1.  L  was  chosen  50.  The  factor  G2(z)  of  G  (z) 

corresponding  to  the  L-3  (-47)  extraneous  zeros  was  computed.  It  is  interesting 

to  note  that  this  factor  can  be  computed  by  solving  the  Toeplitz  equations  in 

2.14  by  using  Levinson's  recursions  [39].  The  magnitude  of  the  reciprocal  of 

G2(z)  evaluated  on  the  unit  circle  is  plotted  in  figure  2.2  Figure  2.3  shows 

the  47  extraneous  zeros  and  3  signal  zeros  of  G^(z).  Note  that  the  extraneous 

zeros  are  approximately  uniformly  distributed  within  the  circle.  However,  if 

the  data  is  noisy,  the  extraneous  zeros  tend  to  fall  closer  to  or  outside  the 

unit  circle  and  are  usually  observed  as  spurious  spikes  in  spectral  estimate 

plots  (see  for  example  ref.  [49]). 

If  the  data  sequence  is  composed  of  exponentially  damped  signals,  and  if 
we  use  the  forward  data  matrix  A£  (i.e.,  if  we  solve  A^g^-£  to  find  the  poly-’ 
nomlal  G^(z)),  we  notice  that  both  the  signal  zeros  and  the  extraneous  zeros 


r 
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have  magnitude  less  than  unity.  In  this  case  the  signal  zeros  cannot  be  identi 
fled  without  further  information.  Instead,  using  the  backward  data  matrix  A^ 
(i.e.,  using  the  polynomial  G^(z))  one  can  distinguish  the  M  signal  zeros  since 
only  they  fall  outside  the  unit  circle.  This  fact  is  made  use  of  in  chapters 
4  and  5.  An  example  is  given  in  figures  2.4  and  2.5. 

2.4:  SUMMARY: 

The  degree  L,  of  G(z)  has  to  lie  in  between  M  and  N-M  (or  N-M/2  in  the 


case  of  forward-backward  equations) .  Only  then  the  signal  parameters  can  be 

extracted  from  the  M  zero  locations  of  G(z) .  If  the  coefficients  of  G(z)  are 

2  2  2 

chosen  such  that  g^-l  and  |g^|  +  | g^ |  +  .  +  ig, |  is  minimum,  then  the 

L-M  extraneous  zeros  always  fall  inside  the  unit  circle.  This  fact  can  be 
used  to  identify  which  M  of  the  L  zeros  of  G(z)  are  signal  zeros. 
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CHAPTER  3:  MODIFICATIONS  TO  PRONY'S  METHOD 


In  this  and  the  next  chapters  we  present  actual  algorithms  for  estimating 
the  unknowns  M,  {s^},  and  {a^}  from  N  samples  of  noisy  data.  In  this  chapter, 
we  present  a  rather  simple  modification  to  Prony's  method.  The  advantages  of 
our  modifications  are  1)  an  estimate  of  the  value  of  M  can  be  found  unlike  in 
Prony's  method;  2)  the  method  can  now  be  used  at  lower  SNR  values  than  previously 
possible.  It  is  interesting  to  note  that  this  particular  modification  to  Prony's 
method  had  not  previously  been  attempted,  to  our  knowledge,  in  spite  of  the  vast 
literature  [34,35]  on  this  topic. 

Two  key  ideas  in  this  modification  are  as  follows.  Firstly,  using  Prony's 
method,  we  try  to  'fit'  L  (L>M)  exponentials  to  the  data  sequence  consisting  of 
M  exponentials  and  noise.  This  has  been  done  by  several  authors  [43-50,18]  and 
is  the  primary  reason  for  improved  accuracy  at  low  SNR.  Subsequently  we  identify 
which  M  of  the  L  exponentials  correspond  to  the  signal  part  of  the  data,  using  a 
least  squares  criterion. 

We  shall  first  describe  Prony's  method  briefly.  The  modifications  we 
propose  are  described  in  section  3.2. 


3.1:  PRONY'S  METHOD  AND  SOME  CLOSE  RELATIVES: 


Prony's  method  (according  to  Hildebrand  [32])  attempts  to  'fit'  a  model 
consisting  of  linear  combination  of  exponentials  to  a  data  sequence.  That  is 


{a^}  and  {s^}  are  determined  such  that, 

L  s.  n 

y(n)  s  I  a.  e  K 
k-1  K 


n-1 ,2, . . .N 


{s,  }  are  obtained  from  L  zeros  of  an  L  th  degree  polynomial  G(z).  The  coeffi- 

L  -k 

cients  of  G(z)  (-1  +  Z  g.  z  ) ,  g, ,  g_,  . . . .  l  are  determined  by  approximately 

k-1  k  12 

satisfying  the  following  set  of  equations. 
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I  y(n-k)  g.  -  -y(n) , 
k-1 


n-L+1  ,L+2 . N 


3.2 


We  shall  call  these  linear  prediction  equations  since  the  (L+l)st  sample  is  'pre¬ 
dicted'  by  a  linear  combination  of  the  L  samples  and  so  on.  The  'rationale'  for 
attempting  to  satisfy  3.2  is  that  if  3.1  is  satisfied  with  equality,  3.2  will 
also  be  satisfied  with  equality  and  exact  values  of  s^'s  can  be  found  from  the 
roots  of  G(z).  In  matrix  notation  of  chapter  2  the  above  'forward'  linear  pre- 


where 


written 

as  follows. 

Af£  3  "^f 

y(L) 

y(L-i)  . 

..  y(l) 

y(L+l) 

y(L)  . 

..  y(2) 

3.3 


£  *  [  »  82  *  • 

T 


if 


[y(L+l) ,y(L+2) , . .y(N) ] 


y(N-l)  y(N-2)  .  y(N-L) 

Prony's  method  (Hildebrand  version)  minimizes  the  equation  error  between  the  two 
vectors  on  either  side  of  the  equality  sign  in  equation  3.3,  which  is  ||Aj£  +  h^|| 

(  ||x||  stands  fo  the  L2norm  of  x. )  by  finding  a  coefficient  vector  £.  This 
error  is  also  called  prediction  error  and  Prony's  method  is  also  called  the 
'covariance  method'  of  linear  prediction  [39].  Such  a  vector  £  is  found  by  using 


the  pseudoinverse  [61]  of  A^,  i.e. 

-1 


&  “  -CAf+Af]  A^ 


3.4 


-k 


After  computing  the  £  vector,  the  polynomial  G(z)  *■  1  +  I  g,  z  is  formed. 

3k  ^  k-1 

Its  zeros  e  ,  k«l,2,...L  are  calculated.  The  a^'s  .  1  then  found  by  satisfying 

equation  3.1  in  the  least  square  sense. 

Generally,  Prony's  method  will  achieve  an  inferior  'fit'  to  the  data  com¬ 
pared  to  the  direct  least  squares  approach  taken  in  sections  1.3.1  and  1.3.2 


J 
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because  Prony's  method  attempts  to  minimize  an  equation  error  [[A^g  +  hf  |f  instead 

^  ^  s^n  2 

of  the  true  least  square  error  E  |y(n)  -  E  a.e  [  .  In  a  special  case,  when 

n-1  k-1  * 

the  data  is  composed  of  purely  exponential  signals,  (the  minimum  value  of)  both 
errors  will  be  zero  and  both  methods  will  achieve  an  exact  'fit'.  Recall  that 
in  chapter  2  we  considered  this  (noiseless)  case  in  detail,  where  equation  3.3 
was  satisfied  with  equality. 

In  the  above,  instead  of  using  the  'forward'  prediction  equations,  it  is 

possible  to  write  down  the  'backward'  prediction  equations  as 

L 

E  y*(n+K)  g^  s  y*(n),  n=l ,2, . . .N-L  3.5 

n*l 

or  in  matrix  form  (see  chapter  2) 

Ab&a-hfc  3.6 

and  achieve  a  different  'fit'  to  the  data.  The  coefficient  vector  g  obtained 
by  (approximately)  satisfying  3.6  (g  *  (A^+A^)  *  A^h)  will  be,  in  general, 
different  from  that  obtained  in  3.4.  Going  one  step  further,  both  forward  and 
backward  equation  errors  can  be  minimized  simultaneously,  by  finding  a  common 
vector  g.  That  is,  the  equations  A^^gs-h^  (see  chapter  2)  can  be  used  [45,59]. 
These  modifications  may  or  may  not  provide  a  better  'fit'  to  the  data  depending 
upon  the  data  sequence  y(n),  n=l,2,...N.  But  they  are  helpful  in  signal  para¬ 
meter  estimation  in  particular  situations.  In  chapter  2,  we  mentioned  that  by 
using  backward  prediction  equations  one  might  be  able  to  identify  the  M  signal 
zeros  of  G(z)  from  others.  Several  authors  [45,46,47,50]  have  found  that  using 
forward  and  backward  linear  prediction  equations  simultaneously  gives  more  ac¬ 
curate  parameter  estimates,  when  the  data  is  composed  of  undamped  sinewaves  in 
noise.  These  advantages  are  exploited  in  the  sequel. 

We  pause  a  moment  to  point  out  that  the  coefficient  vector  g  (in  both  cases 
of  A^gs-h^  and  A^gs-h^)  can  computed  using  fast  algorithms  developed  by  Morf 
et  al.  [40]  and  Marple  [41].  Now,  we  shall  describe  the  specific  modifications 
we  propose  and  the  need  for  them. 


3.2:  PROPOSED  MODIFICATIONS: 

The  modifications  we  propose  are  in  two  steps.  The  first  step  is  well 
known  [43-50,18]. 

3.2.1:  USING  A  VALUE  FOR  L  GREATER  THAN  M: 

If  the  data  consists  of  M  signal  components  and  noise,  it  is  unlikely 
that  one  can  'fit'  LSM  exponentials  as  in  3.1  to  the  data  sequence  using  Prony's 
method  and  obtain  accurate  estimates  of  the  signal  parameters.  This  is  because 
the  equations  3.2  (with  LaM)  are  satisfied  only  in  the  absence  of  noise  in  the 
data.  Instead,  one  can  'fit'  using  Prony's  method  L>M  exponentials  to  the  noisy 
data  sequence.  Although  the  true  value  of  M  is  not  known  usually  in  practical 
situations,  such  as  in  speech  signal  processing  [2],  an  upper  limit  on  M  can  be 
assumed  and  L  can  be  chosen  appropriately.  If  the  number  of  data  samples  is 
relatively  small  (say,  30),  L  can  be  chosen  approximately  equal  to  N/2.  Of 
course,  L  should  lie  within  the  upper  limit  of  N-M  (or  N-M/2)  as  pointed  out 
in  chapter  2. 

Intuitively,  the  L-M  additional  exponentials  we  'fit'  to  the  data  tend 
to  model  the  noise  part  of  the  data.  This  tends  to  improve  the  accuracy  with 
which  the  M  other  exponentials  in  the  'fit'  model  the  signal  part  of  the  data, 
eventually  leading  to  more  accurate  parameter  estimates.  A  problem  which  ensues 
from  the  use  of  L>M  exponentials  is  the  difficulty  in  determining  which  M  of 
the  L  exponentials  corresponds  to  the  signal  part  of  the  data. 

The  improvement  in  accuracy  achieved  by  using  a  value  of  L>M  and  the 
associated  problems  are  best  demonstrated  by  the  following  examples.  In  the 
first  example,  we  simulate  in  a  computer  a  sequence  y(n) ,  n»l,2,...25  consist¬ 
ing  of  two  complex,  undamped  sinewaves  in  white,  complex  Gaussian  noise.  We 
use  a  modification  of  Prony's  method,  described  above.  That  is,  the  equations 
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Af  b2-’  ~-fb  are  solved  to  find  £  (i.e.,  _g_"-(A^+A^)  *  A^h^) .  The  polynomial 

G(z)«l  +  I  g  z  is  formed  and  its  zeros  are  found.  These  zeros,  obtained 
k-1 

over  50  trials  in  which  different  noise  epochs  were  used,  are  plotted  in  figures 
3. 1-3.3  with  respect  to  the  unit  circle  |z|  *  1.  The  true  zero  locations  (on 
the  unit  circle)  corresponding  to  the  two  complex  sinewaves  are  shown  by  arrows. 
In  figure  3.1  we  chose  L*2.  That  is  G(z)  is  a  second  degree  polynomial.  If 
the  data  were  noiseless,  this  would  be  adequate  to  find  the  frequencies  of  the 
sinewaves.  But  the  noise  in  the  data  causes  the  zeros  of  G(z)  to  fall  far  away 
from  their  true  locations.  In  figure  3.2,  we  chose  L»12.  We  used  the  same  data 
sequences  as  before.  Clearly,  there  are  two  clusters  of  zeros  around  the  true 
frequency  locations,  indicating  improvement  in  estimation  accuracy.  The  situ¬ 
ation  at  lower  SNR  is  shown  in  figure  3.3 

Although  providing  redundancy  in  L  improves  the  accuracy  of  the  zero  lo¬ 
cations  and  hence  the  parameter  estimates,  the  L-2  extraneous  zeros  cause  a 
dilemma.  At  high  SNR,  these  zeros  behave  as  predicted  in  chapter  2  and  are  dis¬ 
tributed  within  the  unit  circle.  Hence,  as  usually  done  [47,493,  one  can  find 
the  zeros  closest  to  the  circle  and  use  their  angular  location  to  estimate  the 
frequencies.  Alternatively,  the  location  of  the  largest  ’spectral  peaks’  in  a 
’spectral  estimate* ,  which  is  the  magnitude  of  the  reciprocal  of  G(z)  evaluated 
on  the  unit  circle,  can  be  used  as  frequency  estimates  [45].  But  at  low  SNR, 
some  of  the  L-2  extraneous  zeros  of  G(z)  also  fall  close  to  the  unit  circle, 
and  could  cause  large  error  in  the  frequency  estimates.  These  zeros  cause  the 
large  spurious  spectral  peaks  observed  in  ’spectral  estimate’  plots  when  a  large 
value  of  L  is  used  (see  for  e.g.  ref.  [46]).  The  problem  is  even  worse  if  the 
data  is  composed  of  exponentially  damped  sinusoids  as  seen  in  figures  3.4  and 


Figure  3.1:  The  zeros  of  G(z)  for  50  Independent  trials.  The  data  sequence 

y(n+l)-e:52IIfln+^1+eJ2IIf2n+^24W(n)f  n-0, 1 ,2, . .  .24,  f1«0.52Hz,  f2-0.5Hz,  <^-11/4, 

2  2 

^2*0,  SNR  Is  defined  as  10  Log  (l/2o  ),  where  2o  is  the  variance  of  the  complex 
valued  noise  samples  w(n) .  SNR-20  db.  L  was  chosen  2. 


Figure  3.4:  The  zeros  of  G(z).  The  data  sequence  used:  y(n+l)-e  *  +eS2n+w(n) 
n«0, 1 ,2, . . .24.  SNR*20  db.  SNB.  is  defined  as  10  log  (l/2o^)  where  2o^  is  the 


variance  of  any  noise  sample  w(n) .  The  forward  linear  prediction  equations  are 


t  -It 

used,  i.e.,  ^»-(A^A^)  A^h^ .  Note  the  two  signal  zero  clusters.  The  true  lo¬ 


cations  are  marked  by  ’x’.,  s^-0. 1+j 211  (0.52)  i  s2*-0.2+j2II(0.42) 
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In  the  next  subsection  we  suggest  a  method  to  eliminate  the  L-M  extraneous 


zeros  using  a  least  squares  criterion. 


3.2.2:  ESTIMATING  THE  VALUE  OF  M  AND  ELIMINATING  THE  L-M  EXTRANEOUS  EXPONENTIALS 
Up  to  this  point  our  procedure  is  essentially  identical  to  Prony's  method. 
Now  we  make  use  of  our  knowledge  regarding  the  data  sequence,  i.e.,  that  it  con¬ 


tains  M  exponentials  and  relatively  minor  random  errors.  Secondly,  the  L  zeros 

h 


of  G(z)  which  correspond  to  L  exponentials  e  ,  k*l,2,...L  contain  the  desired 
exponentials  which  model  the  signal  part  of  the  data.  Thus  we  seek  to  find  M 


(if  M  is  not  known,  its  estimate  M  can  be  found  as  explained  below)  out  of  the 


sk. 


L  exponentials,  e  i,  lik^iL ,  i“l,2,...M  that  best  fit  the  data,  in  the  sense 


that  they  give  minimum  value  for  E,  defined  below. 


N 


E  -  E 
n-1 


M 


y(n) 


E  a,  e 
i-1  ^i 


skin 


Min 


a.  ,  i**  1,2,  • .  .M 


3.7 


The  above  minimization  involves  choosing  combinations  of  M  exponentials  out  of 

L 


the  L,  solving  a  linear  least  square  problem  for  the  a^  ,  and  checking  M 
combinations  to  find  the  best  combination,  i.e.  the  one  which  gives  the  minimum 
value  for  E.  The  corresponding  amplitude  estimates  are  obtained  as  a  by  product. 
Some  of  the  L  exponentials  can  be  eliminated  to  start  with,  using  prior  informa¬ 


tion.  For  example,  if  one  is  looking  for  frequencies  of  undamped  sinewaves. 


those  e  ' s  which  have  magnitude  far  greater  than  or  far  less  than  unity  can  be 
eliminated  before  choosing  the  best  subset  of  size  M.  Also,  an  efficient  subset 
selection  procedure  due  to  Hocking  and  Leslie  [62]  can  be  used  to  reduce  compu¬ 
tation.  The  author  came  to  know  about  this  procedure  from  the  work  of  Tufts  and 
Khorrami  [63]. 

a 

If  M  is  not  known  apriori  an  estimate  of  M  called  M,  can  be  found  as 
follows.  Choose  M»l,  and  find  the  best  subset  of  size  unity  that  best  fits  the 

A 

data.  Call  the  corresponding  minimum  error  E, .  Then,  choose  M“2  and  find  the 
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best  subset  of  size  two  and  the  corresponding  minimum  error  E2.  Repeat  the  pro¬ 
cedure  until  the  rate  of  decrease  of  the  error  with  increasing  values  of  M  is 
small,  consistent  with  the  modelling  of  the  residual  random  errors.  The  Integer 
i  at  which  shows  the  significant  drop  in  rate  of  decrease  is  taken  as  M.  An 
example  showing  the  value  of  the  minimum  error  vs.  M  is  given  in  figure  3.6. 
Simulation  results  using  these  modifications  are  presented  in  chapter  5. 

3.3:  DISCUSSION: 

Ideally,  to  fit  exponentials  to  a  data  sequence  y(n)  one  has  to  minimize 
N  M 

the  error  I  |y(n)  -  I 
n«l  k«l 

This  is  a  difficult  problem,  even  if  the  value  of  M  is  known.  Instead,  we  find 

„sk 

the  exponentials  e  's  separately  as  is  often  done,  using  Prony's  method.  How¬ 
ever,  we  have  made  use  of  the  fact  that  if  the  data  is  composed  of  exponentials 
and  noise,  overestimating  the  degree  L  (>M)  of  the  polynomial  G(z)  improves  the 
accuracy  of  the  M  signal  zero  locations.  Subsequently,  we  select  the  M  out  of 
L  exponentials  (or  zeros)  that  best  explain  the  data.  Of  course,  this  step 
assumes  that  the  random  errors  w(n)  in  formula  1 . 1  are  relatively  minor  compared 
to  the  signal  energy.  But  this  assumption  has  to  be  valid  to  a  lesser  or  greater 
degree  for  any  estimation  procedure  to  work.  Using  these  modifications  Prony's 
method  can  be  used  at  (relatively)  lower  SNR  than  previously  possible. 


skn^ 


with  respect  to  a^'s  and  s^'s  simultaneously. 


MlN*  MUM 


Figure  3.6:  Minimal  value  of  the  error  E  in  equation  3.7  for  different  values  of 
M  for  a  typical  realization  of  the  data  is  plotted  at  three  SNR  values. 
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CHAPTER  4:  SINGULAR  VALUE  DECOMPOSITION  AND 
IMPROVED  PARAMETER  ESTIMATION 

In  this  chapter,  we  shall  present  two  related  methods,  called  the  Tufts- 
Kumaresan  (TK)  method  and  the  Improved  Pisarenko  (IP)  method  for  estimating  the 
signal  parameters  and  M.  These  methods  tend  to  be  slightly  more  accurate  than 
the  modified  Prony  (MP)  method  presented  in  chapter  3. 

The  improvement  in  accuracy  of  the  two  methods  presented  in  this  chapter 
is  attributable  to  the  following  two  reasons.  Firstly,  as  in  the  previous 
chapter,  we  trade  redundancy  in  the  degree,  L  of  G(z) ,  for  increased  tolerance 
to  noise  in  the  data.  Secondly,  we  use  an  adaptive  filtering  method  to  improve 
the  SNR  before  estimating  the  signal  parameters.  Motivation  for  the  second 
point  is  as  follows.  Recall  that  in  chapter  2,  we  showed  that  the  parameters 
{sk>  can  be  found  exactly  from  the  zeros  of  an  L  th  degree  polynomial  G(z)  if 
the  N  data  samples  are  noiseless  and  L  obeys  certain  inequalities.  Therefore, 
one  might  expect  that  Improving  the  SNR  in  the  data  and  emulating  the  noiseless 
situation  as  closely  as  possible,  one  can  improve  the  accuracy  of  the  parameter 
estimates.  But  how  does  one  improve  the  SNR  in  the  data  when  the  data  record 
is  short  and  nothing  is  known  about  the  signal  parameters  {s^}  and 
though,  we  do  not  know  the  value  of  the  signal  parameters,  a  special  property 
of  exponential  signals  can  be  made  use  of  for  improving  the  SNR.  For  exponen¬ 
tial  signals,  as  we  saw  in  chapter  2,  the  data  matrix  A^  (or  A^  or  Afb  as  the 
case  may  be)  has  a  rank  M  when  the  data  is  noiseless.  We  might  call  the  sig¬ 
nals  which  have  this  property  low  rank  or  singular  signals.  This  Information 
can  be  used  to  get  rid  of  some  noise  in  the  data.  This  is  done  by  using  SVD 
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of  Che  data  matrix.  Once  this  is  done,  we  can  compute  a  polynomial  G(z)  as 
before,  and  obtain  more  accurate  estimates  of  {s^K 

The  definition  of  SVD  of  a  matrix  and  reasons  for  using  it  in  our  methods 
are  given  in  section  4.1.  In  the  TK  method  we  show  how  to  compute  the  coeffi¬ 
cients  of  the  polynomial  G(z)  using  SVD.  This  is  described  in  section  4.2. 

Several  researchers  [64-77]  in  the  past  have  used  eigenvalue-eigenvector 
decomposition  of  autocorrelation  matrices  to  estimate  certain  parameters.  These 
works  are  closely  related  to  our  methods  presented  in  this  chapter.  These  re¬ 
lationships  are  spelled  out  in  section  4.5.  Of  particular  importance,  from  a 
historic  point  of  view,  is  the  work  of  Pisarenko  [71].  He  discovered  that  the 
frequencies  of  M  undamped  sinewaves  can  be  found  from  the  roots  of  an  M  th 
degree  polynomial,  formed  with  the  elements  of  an  eigenvector  corresponding  to 
the  minimum  eigenvalue  of  an  (M+-1)  x  (M+l)  true  autocorrelation  matrix  of  data. 
Although,  Pisarenko  seems  to  have  pointed  out  this  fact  first,  the  procedure  he 
outlined  for  estimating  the  frequencies  from  a  finite  data  record  gives  very 
poor  estimates  [45].  In  section  4.3  we  show  how  to  improve  upon  his  idea  and 
how  it  relates  to  the  TK  method.  In  section  4.4  we  discuss  the  trade-offs  in¬ 
volved  in  choosing  a  value  for  the  degree  of  G(z) ,  for  a  given  value  of  N,  in 
order  to  obtain  the  most  accurate  parameter  estimates. 

4.1:  REASONS  FOR  USING  SVD  OF  A  DATA  MATRIX: 

SVD  is  the  generalization  of  eigenvalue  decomposition  to  include  rectan¬ 
gular  matrices.  Let  A  be  a  (mxn)  matrix.  The  SVD  of  A  is  a  product  of  three 
matrices  [61,51] 

A  -  UIV+  4*1 

The  matrices  U  and  V  are  unitary  [79]  and  I  is  a  rectangular  diagonal  matrix  of 
the  same  size  as  A  with  real,  non*negative  diagonal  entries.  The  diagonal 
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entries  ax,  o2,  a3 .  ,  called  the  singular  values  of  A,  are  conventionally 

ordered  with  the  largest  in  the  upper  left  hand  corner.  The  singular  values  of 
A  are  the  non-negative  square  roots  of  the  eigenvalues  of  A+A  or  A  A+. 

In  our  case  the  A  matrix  is  one  of  the  data  matrices,  A^,  A^,  or  A^  re¬ 
written  here  for  convenience. 


y(L) 

y(L-i) 

....  y(l) 

y(L+l) 

y(L) 

• 

....  y(2) 

y(N-l) 

• 

• 

y(N-2) 

....  y(N-L) 

y*(  1) 

y*(2) 

....  y*(L) 

y*(2) 

• 

y*(3) 

• 

....  y* (L+l) 

•  •  •  •  • 

• 

• 

y*(N-L) 

• 

• 

• 

•  •  •  •  • 

•  •  •  •  ■ 

....  y(N-l) 

The  matrix  A^  (A£  and  A^)  is  the  same  as  the  above  A^  (A^  and  A^)  except  for 
an  additional  column  to  its  left.  (See  section  2.1.)  In  our  estimation  algo¬ 
rithms  one  of  the  above  matrices  will  be  used  depending  on  the  type  of  signal 
in  the  data.  If  the  signal  is  composed  of  exponentially  damped  sinusoids  we 
may  choose  to  use  A^  (or  A£)  and  if  the  signals  are  undamped  sinusoids,  A^ 

(or  Aj^)  will  be  used.  We  shall  use  the  matrix  symbol  A  to  denote  any  of  the 
above  data  matrices  A^,  A^»  or  A^i  and  A'  to  denote  A£,  A^,  or  A^. 


An  Important  property  of  matrix  A  (or  A'),  is  that  it  has  a  rank  M  if  L, 

which  determines  the  number  of  columns  and  rows  of  A  (or  A'),  is  chost n  vithin 

the  limits  specified  in  chapter  2  (i.e.,  MiLsN-M  for  A^,  A^,  Aj. 

MiLi(N-M/2)  for  A^  and  A^)  and  the  data  is  noiseless.  Therefore  A^A  (or 
*f*  "f* 

A'  A')  and  A  A  (or  A'  A'  )  have  M  non-zero  eigenvalues.  Since  the  singular 
values  of  A  (or  A’)  are  the  (positive)  square  root  of  these  eigenvalues,  A 
(or  A')  has  M  non-zero  singular  values.  We  shall  call  the  singular  values  of 
A  .  ...>  o^,  q^|  -Q,  a^^*^****  The  number  of  zero  singular  values  de¬ 

pends  on  the  order  of  A  (or  A'). 


and 


and 


Another  important  property  of  A  (or  A')  regards  the  eigenvecttrs  of  A  A 
(or  A ' ^ A ' ) .  The  M  principal  eigenvectors  of  A+A  are  linear  combination  of  M 


linearly  independent  vectors  c^,i=l ,2 , . . .M,  where 

£l  =  tt.  e'sl.  r2s\  ... 


i«l,2,...M 
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This  fact  is  established  in  Appendix  B.  These  vectors  are  said  to  span  the 
signal  subspace.  The  reason  is  that  any  L  contiguous  (complex  conjugated) 


samples  (in  the  reversed  time  direction)  of  the  signal  sequence  can  be  repre¬ 


sented  as  a  linear  combination  of  c^’s.  The  other  eigenvectors  of  A^A  (or  A,+A*) 


are  orthogonal  to  the  M  principal  eigenvectors  (since  A  A  (or  A’  A')  is  Hermitian) 
and  hence  orthogonal  to  the  c^'s.  This  is  another  useful  property  of  the  eigen- 
space  of  such  matrices. 

If  the  data  is  noisy,  the  above  properties  are  not  exactly  true,  but  they 
tend  to  be  approximately  true.  That  is,  the  M  principal  singular  values  of  a 
noisy  matrix  A  (or  A')  tend  to  be  larger  than  the  rest  ****  (which 

were  originally  zero).  Also,  the  M  eigenvectors  which  correspond  to  the  M 


principal  eigenvalues  of  A  A  or  A  A  are  less  susceptible  to  noise  perturbations 


in  comparison  to  the  rest  of  the  eigenvectors.  These  statements  are  justifiable, 
qualitatively  at  least,  using  the  first  order  perturbation  results  on  eigenvalues 
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and  eigenvectors  of  matrices  due  to  Wilkinson  [91].  In  short,  Wilkinson's 
results  show  that  the  amount  of  perturbation  in  an  eigenvector  direction  (com¬ 
pared  to  the  noiseless  eigenvector  direction)  is  dependent  on  the  noise  level 

and  the  separation  of  the  corresponding  eigenvalue  from  the  rest.  Therefore, 

■|» 

the  M  principal  eigenvectors  of  A  A  (or  A'  A')  are  much  less  perturbed  since 
the  corresponding  eigenvalues  are  larger  and  generally  well  separated  compared 
to  the  rest  (whose  eigenvalues  were  zero  in  the  noiseless  case) .  This  robust¬ 
ness  property  of  the  principal  eigenvectors  is  made  use  of  in  our  improvements. 
Several  authors  have  utilized  this  property  of  the  principal  eigenvectors  in 
similar  circumstances.  See  section  4.5  for  details. 

The  above  facts  are  related  to  a  matrix  approximation  theorem  due  to 
Eckart  and  Young  ] 83 ] .  This  connection  was  recognized  by  Holt  and  Anthill 


[76],  Henderson  [57],  and  Tufts  et  al.  [56].  The  Eckart  and  Young  theorem 


states  that  the  rank  M  matrix  Aj^  minimizing  the  norm  1 1 A— AM 1 1  (defined  as  the 
trace  of  (A-A^) ^ (A-A^) )  is  given  by 

^  *  0V+  4'6 


where  is  the  matrix  I  with  the  singular  values  ,  oM|  ^ . set  to  zero . 


Note  that  the  matrix  A^  is  a  sum  of  M  outer  products  involving  the  M  principal 
singular  values  and  eigenvectors  of  A^A  and  A  a"*".  Thus,  it  is  satisfying  to 


note  that  the  use  of  M  principal  singular  values  of  A  and  the  corresponding 
principal  eigenvectors  of  A+A  (and  A  A+)  directly  follows  from  a  (reduced  rank) 


matrix  approximation  result.  Recalling  that  the  signal -only  data  matrix  had  a 
rank  M,  the  rank  M  approximation  (A^)  to  the  noisy  data  matrix  A  (or  A')  can 
be  expected  to  be  'closer'  to  the  signal-only  data  matrix  than  A.  We  shall 


now  present  a  computer  simulation  which  justifies  this  statement. 


We  construct  a  data  matrix  A  with  data  samples  consisting  of  two  complex 
sinewaves  and  noise.  Its  rank  M(»2)  approximation,  A^  is  found  by  using  SVD 


as  explained  above.  Another  data  matrix  A  is  constructed  with  signal  only.  Its 
rank,  of  course,  is  2.  Then  we  measure  the  closeness  of  the  matrices  A  and  A^  to 
A  by  computing  the  following  quantity  Q. 

Q  -  || A+A  -  fill  -  l|AM+AH  -  A*A  || 

Thus  if  Q  is  positive  it  means  that  the  matrix  A^  is  closer  to  A  than  A.  Five 
hundred  such  trials  are  performed  with  d'.fferent  noise  epochs  and  the  histogram 
of  Q  is  plotted  in  figures  4.1  and  4.2  at  two  different  SNR  values.  Even  at  0 
db  SNR,  Q  is  almost  always  positive,  showing  that  the  matrix  A^  is  almost  always 
closer  to  the  noiseless  data  matrix  A.  In  other  words,  computing  A^  (or  using 
the  M  principal  eigenvalues  and  eigenvectors  of  A^A)  Indeed  enhances  the  signal. 
See  also  ref.  [56,50]. 

Based  on  the  above  justifications  the  following  statements  can  be  made. 

(1)  From  the  size  of  the  singular  values  of  A,  an  estimate  of  M,  namely 
M  can  be  obtained.  This  is  because  the  principal  singular  values  tend  to  be 
larger  than  the  noise  related  singular  values  which  were  originally  zero. 

(2)  The  M  principal  eigenvectors  approximately  span  the  signal  subspace 
and  hence  using  them  as  opposed  to  the  whole  A  (or  A')  matrix  amounts  to  an 
effective  SNR  Improvement  [56]. 

It  may  be  noted  that  SVD  is  used  here  not  to  improve  the  numerical  sta¬ 
bility  of  computations  for  which  it  is  often  intended  [51,61],  but  to  improve 
the  SNR  in  the  data.  Equivalently,  the  eigenvalue  decomposition  of  A^A  or  A  a"*" 
can  be  used  with  little  loss. in  accuracy. 

4.2:  TUFTS-KUMARESAN  METHOD: 

In  this  method  we  show  how  to  take  advantage  of  the  properties  of  the 
principal  eigenvalues  and  eigenvectors  to  compute  a  polynomial  G(z)  which  re¬ 
sults  in  improved  estimation  accuracy.  We  set  up  as  in  the  modified  Prony  (MP) 
method  (in  chapter  3)  the  following  set  of  equations  using  the  noisy  data  samples 


|IATA-*A1l  -  HAjAr  A  AH 


. 


■v 

1 

\ 

Figure  4.1:  Histogram  of  Q-  ||A+A  -  A+A  ||  -  Ha^  -  A*!  || .  A  is  the  data  matrix 
with  noisy  data  samples.  The  same  formula  used  for  data  generation  in  figure  3.1 
is  used.  500  independent  trials  were  performed.  A  is  a  data  matrix  with  noiseless 
data  samples.  A^  is  the  rank  2  approximation  to  A.  ||x  ||  is  the  trace  of  X+X.  L 


is  chosen  12.  N-25.  SNR-20  db. 


All 
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y(n) ,n«l ,2, . . .N. 

A  £  s  -h  4.7 

Unlike  Che  MP  method  we  do  noC  attempt  to  satisfy  the  above  equations  in  the 
least  square  sense  directly.  First,  we  Improve  the  SNR  in  the  data  as  described 
above  by  computing  the  SVD  of  A.  Based  on  the  size  of  the  singular  values  of  A 
we  decide  on  the  value  or  M.  AM  which  approximates  A  in  the  least  square  sense 

A 

and  has  a  rank  M  is  given  by 

Ag  -  UZ-V+  4.8 

Then,  we  solve  the  following  linear  equations  instead  of  the  original  equation 
in  4.7. 


a££  "  -j*  4.9 

There  are  many  possible  solutions  to  these  equations  because  the  rank  of  Ag  is 
less  than  full.  But  using  the  pseudoinverse  of  A^,  we  find  the  vector  g  as 
follows. 

£  «  -A^h  4.10 


This  pseudoinverse  solution  has  a  desirable  property.  That  is,  it  finds  a 
solution  vector  £,  which  has  the  minimum  length  ||£  ||.  Note  that  this  is  the 
condition  we  Imposed  on  the  £  vector  in  chapter  2  to  restrain  the  L-M  extra¬ 
neous  zeros  to  be  inside  the  unit  circle.  The  pseudoinverse  of  A^  can  be 
written  as  [61], 


# 


•i 


uz„v 

n 
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where  Z«  is  obtained  from  Z^  by  replacing  each  positive  diagonal  element  by  its 
reciprocal.  Then  the  minimum  norm  £  vector  can  be  written  as, 

A 

4  ■  v,  ~h  4-12 

k-1  k 

where  v^  and  u^,  k-1, 2, . *  are  the  columns  of  V  and  U.  The  minimum  norm 


vector  £  can  also  be  written  as 
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1  -  -  2  -J—  (^V^Ot^  4.13 

k*l  k 

where  are  the  eigenvalues  of  A^A  or  A  A*. 

Thus  in  the  TK  method,  we  try  to  emulate  the  noiseless  case  by  first 
finding  a  matrix  Ag  which  is  as  close  as  possible  to  the  noiseless  signal 
matrix  and  then  finding  a  coefficient  vector  £  which  has  minimum  norm.  If  M 
is  chosen  to  be  equal  to  or  slightly  greater  than  M  and  the  data  SNR  is  high 
enough,  the  behaviour  of  the  signal  and  extraneous  zeros  of  G(z)  obtained  by 
the  TK  method  are  close  to  that  predicted  in  the  noiseless  case  in  chapter  2. 

Figure  4.3  shows  the  zeros  of  the  polynomial  G(z)  obtained  by  using  the 
TK  method  for  several  trials.  The  coefficients  of  G(z)  are  calculated  as  in 
equation  4.12.  The  same  data  sets  and  value  of  L  used  in  figure  3.3  are  used. 
For  comparison,  the  zeros  of  G(z)  when  the  data  is  noiseless  is  plotted  in 
figure  4.4.  Comparing  figures  3.3,  4.3,  and  4.4,  it  should  be  clear  that  the 
effect  of  using  SVD  is  to  enhance  the  SNR  and  emulate  the  noiseless  case  as 
closely  as  possible.  Figure  4.5  (compare  this  with  figure  3.5)  shows  that 

situation  in  which  the  signal  is  composed  of  exponentially  damped  signals. 

The  equations  A^£ 

Numerical  analysts,  attempting  to  solve  ill-conditioned  linear  equations 
such  as  here,  have  routinely  used  orthogonalizaton  techniques  such  as  QR  decom¬ 
position  [61]  or  truncated  SVD  as  above,  to  solve  them.  But  our  situation  is 
slightly  different.  In  our  case,  in  addition  to  using  SVD  of  A,  we  can  choose 
the  value  of  L  (in  other  words  we  can  alter  the  set  of  equations  A^s-h  itself) , 
which,  as  we  saw  in  the  previous  chapter,  has  considerable  effect  on  the  im¬ 
provement  in  estimation  accuracy.  The  choice  of  the  value  of  L  is  discussed 

in  section  4.4. 

We  may  point  out  that  Strand  [80]  has  proposed  a  method  to  compute  the 
truncated  SVD  solution  as  in  4.12  without  having  to  compute  the  SVD  explicitly. 


■h^  were  used  in  this  case. 
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4.3:  IMPROVED  PISARENKO  METHOD: 

In  this  section  also,  we  use  SVD  of  a  data  matrix  (A'),  but  the  emphasis 
is  on  using  the  smaller  singular  values  and  corresponding  eigenvectors.  We 
improve  upon  an  idea  proposed  by  Pisarenko  [71]  and  hence  the  name  improved 
Pisarenko  method.  It  is  not  significantly  different  from  the  TK  method  philo¬ 
sophically.  But,  it  helps  us  point  out  the  links  between  several  eigenvector 
methods  that  are  available  today  (August  1982) .  These  links  are  described  in 
section  4.4. 

4.3.1:  PISARENKO  METHOD  AND  ITS  DIFFICULTIES: 

Pisarenko  [71]  seems  to  have  originated  the  idea  of  using  the  minimum 
eigenvalue-eigenvector  polynomial  to  find  the  frequencies  and  power  of  M  un¬ 
damped  sinewaves  in  white  noise.  By  'eigenvector  polynomial'  we  mean  the 

T 

following.  If  e*(e  ,e . a. )  is  an  (eigen) vector,  then  the  corresponding 

0  1  Lt  _ 

L  -k 

eigenvector  polynomial  is  defined  as  E  e,  z  .  Pisarenko,  using  the  elements 

k-0 

of  the  eigenvector  corresponding  to  the  minimum  eigenvalue  of  an  (Mfl)x(Mfl) 
true  auto  correlation  matrix  of  data  consisting  of  M  complex  sinewaves  and 
white  noise,  constructed  an  M  th  degree  polynomial  and  showed  that  the  roots 
of  this  polynomial  (which  lie  on  the  unit  circle)  gave  the  frequencies  of  the 
sinewave. 

There  are  at  least  three  difficulties  associated  with  the  Pisarenko 
method.  (1)  The  actual  algorithm  he  devised  for  finite  data  records  does 
not  utilize  the  data  effectively  to  obtain  accurate  estimates.  One  reason  is 
the  limitation  on  the  order  of  the  matrix,  which  has  to  be  (M+l)x(MfI) .  This 
causes  eigenvector  polynomial  G(z)  to  be  of  degree  M.  We  know  from  the  pre¬ 
vious  chapter  and  intuition  that  the  degree  L  of  G(z)  needs  to  be  greater  than 
M  to  improve  the  accuracy  of  the  parameter  estimates.  (2)  His  algorithm  can 
only  be  used  for  estimating  the  frequencies  of  undamped  sinewaves.  This  is 
because  he  forces  the  estimated  autocorrelation  matrix  to  be  Toeplitz.  The 


effect  of  this  is,  that  the  polynomial  constructed  with  the  elements  of  the 
eigenvector  corresponding  to  the  minimum  eigenvalue  (if  it  is  unique)  of  the 
Toeplitz  matrix  will  have  zeros  on  the  unit  circle  [81].  Obviously,  zeros  of 


such  a  polynomial  cannot  be  used  to  estimate  the  parameters  (s^)  of  damped 
sinusoidal  signals.  (3)  Also,  Pisarenko's  method  will  give,  in  general, 
biased  values  of  the  frequencies  of  even  noiseless,  undamped  sinewaves  for 
finite  data  records.  The  last  two  problems  are  caused  by  forcing  the  estimated 
autocorrelation  matrix  to  be  Toeplitz. 


4.3.2:  BASIS  FOR  IMPROVING  PISARENKO'S  METHOD: 


The  following  observations  form  the  basis  of  our  improvements  to  Pisarenko 
method.  Recall  that  in  chapter  2  (see  propositions  1,  2,  and  3)  we  pointed  out 
that,  if  the  data  is  noiseless  and  if  a  vector  £'  (gg,  the  first  element  need 
not  be  unity)  satisfies  the  homogeneous  equation. 


A'^'-o 


4.14 


then  the  signal  parameters  can  be  exactly  recovered  from  the  zeros  of 

L  -k  t 

G(z)  ■  Z  g,  z  .  Premultiplying  the  above  equation  by  A  '  we  get 

k-0  *  . 

A'V^'-O 


Since  the  rank  of  A'  is  M  (see  chapter  2),  A  +A'  will  have  only  M  non  zero  eigen 
values.  Therefore,  the  above  is  an  eigen  equation  corresponding  to  a  zero  eigen 
value  (which  has  a  multiplicity  L+l-M)  and  £'  is  an  eigenvector.  It  is  possible 
to  find  L+l-M  such  eigenvectors.  We  shall  say  that  these  vectors  span  the  null 
or  noise  subspace.  jj'  can  be  chosen  as  any  vector  lying  in  this  subspace.  If 
L  is  chosen  equal  to  M,  then  £'  will  be  unique.  The  maximum  value  of  L  can  be 
N-M  or  N-M/2  depending  on  the  type  of  matrix  used  (see  chapter  2,  section  2.1). 

This  is  to  be  contrasted  with  the  Pisarenko  method  where  the  Toeplitz 


correlation  matrix  R,  with  elements  r^ 


£  l  (yOO  y*(k-i)) ,  |  i |  <M+1 ,  is 
k*l 


used.  The  eigenvector  polynomial  corresponding  to  the  minimum  eigenvalue  of  R, 
even  if  the  data  is  noiseless,  will  not  give  the  exact  value  of  (s^)  for  finite 
data  records,  in  general.  Therefore,  we  shall  use  in  our  improvements  the  cor¬ 
relation  matrix  A’^A'  instead  of  R  since  its  eigenvector  polynomial  gives  the 
exact  parameters,  at  least  when  the  data  is  noiseless,  independent  of  the  type 
of  the  signal. 

Secondly,  as  we  saw  earlier,  a  polynomial  G(z)  with  degree  L>M  tends  to 
give  more  accurate  parameter  estimates.  This  turns  out  to  be  true  in  this  case 
also.  However,  the  value  of  L  determines  the  order  of  the  (L+l)x(L+l)  matrix 
A,+A'  and  hence  the  number  of  eigenvectors  (L+l-M)  in  the  null  subspace  or  noise 
subspace  of  A'^A' .  The  questions  are  which  of  these  vectors  should  be  used  to 
construct  a  polynomial  G(z)?  Can  they  be  combined  linearly  (if  yes,  how?)  to 
form  the  coefficient  vector  £*? 

A. 3. 3:  HOW  TO  CONSTRUCT  A  POLYNOMIAL  G(z)  IN  IP  METHOD: 

Let  us  write  the  SVD  of  A'  as  follows. 

A'  -  P  A  Q+  4.16 

m  x  (L+l)  (m  x  m)  m  x  (L+l)  (L+1  x  L+l) 

where  A  is  the  rectangular  diagonal  matrix  with  singular  values  along  the  diagonal 

which  are  the  positive  square  roots  of  the  eigenvalues  of  A,+A'  or  A'  A'"*”.  The 

value  of  m  will  be  determined  by  the  type  of  data  matrix  used.  That  is,  for  A^* 

m*2(N-L)  and  for  A£ 

of  P  and  Q,  £^,1-1 ,2, . . .m  and  ^,1-1 ,2, . . .L+l  are  the  eigenvectors  of  A'  A'"*"  and 
A'^A',  respectively. 

Any  one  of  the  eigenvectors  in  the  noise  subspace  *-3^+2 *  •  *  *%+!  Can 
used  to  construct  an  eigenvector  polynomial.  Examples  are  given  in  figure  4.6- 
4.9  where  different  noise  subspace  eigenvector  polynomial  zeros  are  plotted  for 
50  different  trials.  Note  that  here  too  (as  in  the  modified  Prony  method),  we 


A/,  m«(N-L) .  P  and  Q  are  unitary  matrices.  The  columns 


Figure  A. 6:  The  zeros  of  a  noise  subspace  eigenvector  polynomial  Q(z)  ■  q^+1  ^z 

%H*(qL+l,l’  qL+l,2, . qL+l  ,L+1)T*  *L+l  is  the  L+1  St  •i8«"tector  of  A^A'b 

SNR” 30  db.  L*18,  the  same  data  sets  in  figure  3.4  are  used  in  figures  4.6-4.10 
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face  Che  same  problem  of  distinguishing  Che  signal  zeros  from  Che  extraneous 
ones.  The  procedure  we  outlined  in  section  3.2.2  can  be  used  to  distinguish 
the  L-M  extraneous  zeros  of  these  eigenvector  polynomials. 

Instead,  since  each  noise  subspace  eigenvector  is  a  potential  candidate 
from  which  the  signal  parameters  can  be  estimated,  they  can  be  combined  to 
form  a  single  vector  .  That  is,  £*  can  be  formed  as  a  linear  combination  of 


%+l  2* '**%+! 


L+l-M 

*  *  bk  *k4M 

k*l 


4.17 


How  do  we  choose  the  b^'s?  There  are  several  choices.  For  example,  b^*l, 
k-1,2,...  L+l-M  or  the  b^'s  can  be  set  equal  to  the  singular  values  6^^  ^M+2 

•*,5l+i  etc* 

Another  alternative  is  to  find  the  b^'s  by  imposing  the  constraints  used 

2  2  2 

in  chapter  2;  i.e.,  gQ-l  and  |gt|  +  | g2 |  +  ...  +  |gj|  is  minimum.  Then, 
finding  the  b^'s  subject  to  these  constraints  is  a  standard  minimization 
problem  [82].  But  the  same  vector  £'  can  be  found  more  easily  using  the  fol¬ 
lowing  observation,  jj/  must  be  orthogonal  to  the  M  principal  eigenvectors 
jgLi ,3.2 » •  •  *.3*i  A'*A'  because  it  lies  in  the  span  of  % ,  ^ ( 2 *  *  •  • 1 1  ‘  There¬ 

fore,  the  following  inner  products  can  be  set  to  zero. 

0  1*1,2,. . .M 

Let  us  call  the  j  th  element  of  the  i  th  eigenvector  q 
above  equations  as  follows. 


qi  *■' 


h  y 
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We  can  rewrite  the 


* 

* 

* 

ill 

q12  •••* 

•••*  ql  L+l 

* 

* 

* 

21 

• 

• 

• 

• 

• 

CM 

CM 

cr 

••••  q2  L+l 

• 

• 

• 

• 

HM1 


* 

qM2 


qM  L+l 


1 

- 

0 

81 

0 

g2 

• 

• 

• 

®L 

0 
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Taking  the  first  column  of  the  above  matrix  to  the  right  side  we  can  rewrite  the 


equation  as, 


*  *  * 

m  ■ 

* 

q12  q13  .  qlL+l 

81 

m 

qll 

*  *  * 

* 

q22  q23  .  q2L+l 

•  •  • 

82 

• 

q21 

• 

e  •  • 

*  *  * 

• 

• 

* 

qM2  qM3  qML+l 

®L 

qMl 

T  T 

and  in  matrix  form  as,  (£’*(1,£  )  ), 


4.20 


C  £  -  -c  4.21 

It  might  be  useful  to  weight  the  above  equations  4.19  based  on  the  size  of  the 

corresponding  singular  values.  That  Is,  the  first  equation  is  multiplied  by  the 

first  singular  value  and  so  on.  The  reason  is  that  the  eigenvectors  corresponding 

to  the  larger  singular  values  tend  to  be  less  noisy.  Since  L>M,  the  above  system 

has  more  unknowns  than  equations.  The  minimum  norm  solution  (which  minimizes  the 
2  2  2 

quantity  |g^|  •  +  . ...  +  Ig^J  as  desired)  can  be  found  using  the  pseudo¬ 

inverse  of  C. 


£  -  -C+[CC+]”1c 


4.22 


But  CC^  is  I-c  c+  which  follows  from  the  orthonormality  of  the  eigenvectors  of 


A'  A'.  Therefore  the  expression  for  £  can  be  simplified  to 

£ 


C+c 


4.23 


1-  lie  H' 


Thus  the  desired  £  vector  can  be  obtained  from  the  M  principal  eigenvectors  of 

i  * 


A'  A'  by  linearly  combining  the  last  L  elements  of  each  vector.  An  equivalent 
expression  for  £  can  be  obtained  by  using  the  L+i-M  noise  subspace  eigenvectors 
as  well.  Figure  4.10  shows  the  zeros  of  a  polynomial  formed  with  the  £  vector 
computed  as  per  equation  4.23.  Note  that  the  effect  of  using  the  said  constraints 
is  to  make  the  L-M  extraneous  zeros  fall  inside  the  unit  circle. 


Figure  4.10:  The  zeros  of  G(z)  obtained  by  IP  method.  In  this  case,  the  £  vector 
was  calculated  as  In  equation  4.23.  That  is  £'  lies  in  the  space  of  all  the  L+l-M 
noise  subspace  eigenvectors  of  A£^A^.  Also  the  coefficients  of  G(z)  are  constrained 
to  be  g„-l,  JgJ2  +  I g2 ) 2  + . +  1*1,1  *  18  8  mininu®* 
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4.3.4:  SUMMARY: 

Any  eigenvector  corresponding  to  the  L+l-M  smaller  eigenvalues  of  A,+A*  is 

orthogonal  to  the  signal  subspace  approximately.  Therefore,  a  polynomial  con- 

sk 

structed  with  any  one  of  the  L+l-M  eigenvectors  will  have  zeros  (close  to)  e 
-sk* 

(or  e  ,  in  the  case  when  A^  is  used),  k«l,2,...M.  The  L+l-M  eigenvectors  can 
be  linearly  combined  to  form  a  single  vector  £'  in  several  ways.  One  particularly 
interesting  way  is  by  constraining  the  coefficients  of  G(z) ,  which  are  the  ele- 
ments  of  £' ,  to  be  g0“l,  and  |g^|  +  | g2 |  +  ...  +  |g^|  to  be  minimum.  This 
helps  us  identify  the  signal  zeros  from  the  extraneous  zeros.  Using  these  con¬ 
straints  makes  the  IF  method  resemble  (in  performance  as  well)  the  TK  method 
closely.  Some  other  choice  for  the  b^'s  which  retains  the  'homogenity'  (see 
Henderson  [57]  for  explanation)  may  be  preferable.  In  this  case,  the  elimination 
of  the  L-M  extraneous  zeros  is  a  problem.  It  can  be  solved  using  the  method  in 
3.2.2. 

In  the  next  section  we  discuss  the  choice  of  the  value  of  L  for  a  given 


number  of  data  samples 


4.4:  CHOICE  OF  THE  VALUE  OF  L  AND  A  SPECIAL  CASE  AT  THE  EXTREME  VALUE  OF  L: 

Choosing  Che  best  value  of  L,  the  degree  of  G(z) ,  is  crucial  to  achieving 
the  best  parameter  estimation  performance.  The  value  of  L,  for  a  given  value  of 
N,  determines  the  order  of  the  data  matrices  A  or  A' .  Earlier,  in  chapter  3, 
we  observed  that  the  parameter  estimation  accuracy  increased  with  increase  in 
the  value  of  L.  We  also  observed  that  L  should  be  less  than  or  equal  to  N-M 
(or  N-M/2  in  the  case  of  forward-backward  equations),  where  M  is  the  number  of 
signal  components  in  the  data.  The  interesting  question,  however,  is  how  to 
choose  the  value  of  L  for  a  given  N  to  achieve  the  best  possible  performance. 

We  were  not  able  to  obtain  an  expression  for  best  L,  in  terms  of  N,  M,  SNR,  and 
the  signal  parameters.  However,  to  see  the  tradeoffs  involved  in  the  choice  of 
L,  we  shall  present  a  simple  example. 

4.4.1:  AN  EXAMPLE: 

In  this  example  we  shall  derive  a  simple  expression  for  the  first  order 
perturbations  on  the  signal  zero  of  G(z)  as  a  function  of  the  perturbations  on 
the  coefficients  of  G(z) ,  and  the  value  of  L.  Recall  that  the  signal  parameter 
estimates  are  obtained  from  the  locations  of  the  signal  zeros.  First  we  shall 
calculate  the  coefficients  of  G(z)  when  the  data  is  noiseless. 

Let  us  consider  the  case  when  the  data  is  a  complex  sinusoid. 

y(n)  -  e^Wln  n*l,2,...  4.24 

w^  is  the  angular  frequency  of  the  sinusoid.  We  can  write  the  linear  prediction 
equations  (for  L>M-1)  as, 

L 

I  y(n-k)  g.  -  -y(n) ,  n-L+1  4.25 

k-1 

or  in  matrix  form 

“  “if  4.26 

Since  the  signal  sequence  has  one  complex  sinusoid,  one  equation  is  necessary 
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ta 


and  sufficient  to  find  the  coefficients  g  ,k“l ,2, . . .L.  The  only  signal  subspace 


eigenvector  of  A  A  can  be  shown  to  be 


vL  -  (l/L)*5  (1,  eJwl.  e2jwl,  ...  e(L”1)jwl)T 


Further  the  minimum  norm  solution  vector  £  can  be  shown  to  be 

(-1/L)  a2j”l,  ...  .LJ“1)T 


Therefore  the  polynomial  G(z)  is 

G(z)  -  1  -  E  (1/L)  eJwik  a‘k 
k-1 


4.27 


4.28 


4.29 


Let  z^,  Z2 . z^  be  the  zeros  of  G(z).  z^  ■  e^Wl  is  the  signal  zero. 

Now  we  can  study  the  first  order  perturbation  of  the  zeros  of  G(z)  for 

incremental  changes  in  its  coefficients  by  using  the  formula  (see  for  e.g.  [20]) 

L  «-k>, 


-  E 


dZj 


k-lzi 


4.30 


-1) 


n  (1-z.z 
k-1  *  x 

k*i 

where  Agfc  are  the  perturbations  on  the  coefficients  of  G(z) .  If  G(z)  has  tightly 
clustered  zeros,  small  changes  in  g^  could  cause  large  changes  in  the  zeros  of 
G(z)  since  the  denominator  will  tend  to  be  small.  Note  that  the  minimum  norm 
coefficient  vector  we  have  used  in  our  methods  is  advantageous  in  this  respect 
also  since  it  tends  to  distribute  the  zeros  around  the  unit  circle  (see  chapter 
2)  making  the  denominator  large  in  the  above  formula.  Specializing  the  above 

i  VI 

formula  to  the  signal  zero  z.  -  eJ  1 ,  we  get 

L 


where 


dz, 


G’(z) 


-  2  z  (1-k)Ag 
k-lzl  agk 

G' (z)  at  z-z. 


-  n  (i-z  z  "l> 
k-2  K  L 

L-k  Jwik  -k 

-  II  — : —  eJ  1  z 

k-0 


4.31 


4.32 
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Therefore,  after  some  simplification  dZj  is  given  by  the  formula, 


dzl  " 


-2  Vj 


where 

Ai  ■  (Agx »  Ag2»  ••••  a8l)T  4.34 

a£  is  the  vector  of  perturbations  on  the  coefficient  vector  £.  Tha  magnitude  of 
dz^  determines  the  perturbation  of  the  zero  z^  and  hence  the  accuracy  of  the 
estimate  s^*jw^.  There  are  two  factors  which  determine  |dz^|. 

(1)  Clearly,  as  the  value  of  L  is  increased  |dz^|  tends  to  be  smaller, 
since  L  occurs  in  the  denominator. 

(2)  But,  |v^Ag|  which  appears  in  the  numerator  tends  to  be  larger  as  L  is 
increased.  To  see  the  effect  of  L  qualitatively,  let  us  take  the  case  of  TK 
method.  A  similar  case  can  be  iiade  for  the  IP  method  as  well.  Recall  that  in 
the  TK  method,  the  £  vector  (£  is  the  same  as  £  vector.  The  A  on  matrices, 
vectors,  and  scalars  is  used  to  denote  the  presence  of  noise  in  the  data  (in 
this  subsection  only))  is  calculated  as  (see  equation  4.13) 

£  -  -(1/Xj)  (vjr)^  -  c^  4.35 

where  c^  *  -(v^j:)/X^,  X^,  v^  are  the  eigenvalues  and  eigenvectors  of  A  A  and 
r^A  h.  In  the  absence  of  noise  in  the  data  the  £  vector  is  given  by 

£  -  —  ( 1  /X x )  (v^r)^  -  c^  4.36 

where  Cj  -  -(v^r)  /X^. 

Therefore 


A£  -  £  -  £  -  -  CjVj 

Superscript  'TK'  denotes  d£  vector  corresponding  to  TK  method. 
Premultiplying  by  v^  we  get 

VjA^  -  ^(v^)  -  Cj,  || Vj  ||  -  1 

TK 

Therefore  Ag.  is  primarily  a  function  of  noise  in  the  first  eigenvalue  and 


eigenvector  of  A  A.  This  is  dependent  on  L  because,  for  a  fixed  value  of  N,  the 
perturbation  in  each  element  of  A^A  is  dependent  on  the  value  of  L.  For  example, 
the  (i,j)  th  element  in  AfAf  is  Iy*(k-i+l)y(k-j+l) .  Therefore  the  larger  the 

£  t  k 

value  of  L,  smaller  is  the  averaging  and  hence  larger  is  the  relative  perturba¬ 
tion  on  ^  and  v^,  leading  to  larger  |dZjJ. 

Hence  there  are  two  factors  working  in  the  opposite  direction  as  L  is  in¬ 
creased.  Therefore,  we  can  expect  an  optimum  intermediate  value  of  L  in  between 
M  and  N-M  (or  N-M/2) . 

4.4.2:  A  SPECIAL  CASE:  [92] 

In  the  previous  section,  we  argued  that  a  compromise  value  of  L  in  between 
M  and  N-M  (or  N-M/2)  should  be  used  to  minimize  the  perturbation  on  the  signal 
zero(s).  However,  a  special  case  occurs  at  the  extreme  value  of  I  (-N-M  for 
matrices  A^  and  A^  and  N-M/2  for  A^) •  The  advantage  of  choosing  this  particular 
value  of  L  is  that  no  SVD  calculations  are  required  in  computing  the  coefficient 
vector  £.  Also,  for  this  extreme  value  of  L  the  coefficient  vectors  £  (minimum 
norm  vector)  given  by  the  three  methods  (MP,TK,IP)  are  the  same. 

For  example,  let  us  assume  that  the  data  consists  of  (M-)  4  complex  sine- 
waves  and  noise.  Let  us  write  down  the  (forward-backward)  linear  prediction 
equations  A£»-h  with  L«N-M/2*N-2. 


y(N-2)  y(N-3)  . y(l)  gj  -  -  y(N-l) 

y(N-l)  y(N-2)  . y(2)  g2  y(N) 

y*(2)  y*(3)  . y*(N-l)  .  y*(l)  4.39 

y*(3)  y*(4)  . y*(N)  .  y*(2) 

®L 


Since  the  matrix  A  has  only  (M-)4  rows,  the  rank  of  A  is  M  (assuming  that  N>2M) . 
Therefore,  in  this  case  A«A^  and  hence  the  SVD  calculations  to  find  a  matrix  of 
rank  M  are  unnecessary.  The  coefficient  vector  £  can  be  determined  by  using  the 


pseudoinverse  of  A  directly. 

£  -  -A#h  -  -A+(AA+)_1h  4.40 

The  pseudoinverse  A^*A^(AA^)  ^  is  used  in  this  case  (instead  of(A^A)^A^)  because 
M  being  far  less  than  L,  the  matrix  AA^  will  be  a  small  order  matrix  and  can  be 
inverted  simply. 

This  simplification  in  computation  goes  with  some  reduction  in  performance 
compared  to  the  choice  of  the  ’optimum'  value  of  L  (in  between  M  and  N-M  or  N-M/2) 
The  reason  for  this  is  that  although  the  maximum  allowed  value  of  L  is  used  (note 
that  L  occurs  in  the  denominator  of  equation  4.33) ,  the  perturbation  vector  £ 
tends  to  be  longer.  This  is  due  to  the  increased  noise  in  the  principal  eigen¬ 
vectors  of  A*A  (though  we  do  not  calculate  them  explicitly) ,  since  the  elements 
of  A+A  are  averaged  over  only  M  terms. 
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4.5:  RELATIONSHIP  TO  OTHER  WORKS: 

Several  authors  In  the  past  have  used  the  following  two  basic  Ideas  we  have 
used  In  this  chapter. 

T 

(1)  If  a  vector  £*  ■  (g^,  g^ . g^)  is  perpendicular  to  a  set  of 

-sw  -2sv  -Lsv  T  ^  _k 

vectors,  (1,  e  ,  e  .  e  )  ,  k-I,2,...M,  L  M,  then  G(z)  -  Z  g.z 

sk  k*° 

has  zeros  at  e  *  k«l,2,...M.  This  Is  Prony's  idea. 

(2)  The  M  principal  eigenvectors  of  an  autocorrelation  matrix  (A^A  or  A'^A.') 
are  more  robust  to  noise  perturbations  and  they  contain  the  relevant  signal  infor¬ 
mation  for  high  enough  SNR.  In  this  section  we  shall  describe  the  connection  be¬ 
tween  our  results  and  those  of  others. 

4.5.1:  THE  USE  OF  PRINCIPAL  EIGENVECTORS: 

The  principal  eigenvectors  of  an  autocovariance  matrix  have  been  used  to 
advantage  in  other  ar«4S  of  research.  In  multivariate  statistics,  when  a  large 
number  of  measurements  are  available,  the  question  often  asked  is  whether  if 
these  could  be  replaced  by  a  'fewer  number  of  measurements  or  of  their  functions 
without  loss  of  much  information,  for  convenience  in  analysis  and  interpretation 
of  data'.  The  topic  of  principal  component  analysis  regards  linearly  transform¬ 
ing  the  data  along  the  first  few  principal  eigenvectors  of  the  covariance  matrix 
of  the  data  [94].  The  original  work  in  this  area  is  due  to  Hotelling  [85].  Rao 
[86,95]  showed  that  several  problems  in  multivariate  analysis  have  solutions  based 
on  calculating  the  first  few  eigenvectors  of  a  correlation  matrix. 

SVD  has  become  one  of  the  most  important  tools  of  modern  numerical  algebra 
[61].  Stable  solutions  to  linear  equations  Ax*£  are  routinely  obtained  by  using 
SVD  of  A  and  retaining  only  the  first  few  singular  values  and  the  corresponding 
eigenvectors  of  A  if  the  rest  of  the  singular  values  are  Insignificant.  Deter¬ 
mining  what  is  'insignificant'  is  difficult  and  is  problem  dependent.  SVD  is 
also  used  in  determining  the  numerical  rank  of  a  matrix  [51].  Some  of  the  ori¬ 
ginal  work  in  this  area  is  due  to  Golub  [ 84 ] . 


4.5.2:  RELATED  RESEARCH  IN  THE  AREA  OF  SENSOR  ARRAY  SIGNAL  PROCESSING: 

Most  of  the  work  in  signal  processing  research  which  involves  eigenvalue- 
eigenvector  decomposition  of  correlation/covariance  matrices  is  related  to  sensor 
array  signal  processing.  In  this  area,  a  problem  is  to  find  the  angles  of  ar¬ 
rival  of  plane  waves  at  a  line  array  of  equally  spaced  sensors. 

Let  us  assume  that  we  have  L+l  equally  spaced  sensors  and  the  waveforms 
received  at  these  sensors  are  linear  combinations  of  M  incident  plane  wavefronts 
and  noise.  These  waveforms  (after  frequency  translation)  are  sampled  at  regular 
intervals  in  time.  Let  ^  denote  the  nth  snapshot  of  the  waveforms  at  the  time 
instant  n. 

^n  "  ^yn^  yn^  . yn(L+l))T  4.41 

The  nth  snapshot  (see  ref.  [66])  at  the  kth  sensor  consists  of  independent  noise 

2 

samples  w  (k) ,  with  variance  a  ,  plus  M  source  voltages. 

M  .. 

y  (k)  -  £  a.  eJ  Wi  +  w  (k) ,  k-l,2,...L+l  LiM  4.42 

n  ^  m  n 

where  a^n  is  the  complex  amplitude  of  the  voltage  induced  at  the  i  th  sensor  at 
the  n  th  instant,  w^  ■  2II(qsin(0^) /X)  ,  where  0^  is  the  angle  of  arrival  of  the 
i  th  plane  wave,  q  is  the  spacing  between  the  sensors  and  X  is  the  wavelength  of 
propagation.  The  problem  is  to  determine  w^'s  given  K  snapshots  ^***"1  *2,  • .  .K. 
Note  that  once  w^'s  are  determined  0^'s  are  also  determined.  If  only  one  snap¬ 
shot  ^  is  available  for  processing,  then  the  problem  reduces  to  our  original 
problem  described  in  section  1.1  (specialized  to  undamped  sinewaves  and  with 
L+1*N) .  Several  twists  can  be  introduced  into  the  above  problem;  for  example, 

the  noise  samples  w  (k)  can  be  assumed  to  be  correlated  from  sensor  to  sensor  or 

n 

the  a^n's  can  be  temporally  correlated  or,  in  the  limiting  case,  they  can  be  sine- 
waves  in  the  temporal  dimension  [96],  The  covariance  matrix  R  is  given  by 

R  ■  E(^n  4.43 

where  E  denotes  the  expectation  operator.  Assuming  that  the  noise  samples  are 
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uncorrelated  from  sensor  to  sensor,  R  can  be  written  as  (see  ref.  [68],  for 
example) 

M  +2 

R  -  I  c  S  S  +  o  I  4.44 


m,n*l 


mn 


where  the  (MxM)  matrix  C  (with  elements  c  )  is  the  covariance  matrix  of  the 

mn 

source  signals,  s^,  k«l,2,...M  are  the  direction  vectors  associated  with  the  M 


sources . 


-  (1.  eJwk,  e2juk . eLjwk)T 


4.45 


where  w^  are  related  to  the  angles  of  arrival  9^. 


The  methods  for  estimating  the  w^'s  based  on  the  eigenstructure  of  R  uti¬ 
lizes  the  fact  we  have  been  using  in  this  dissertation,  i.e.,  if  a  vector  is 

orthogonal  to  s^,  (i.e.,  s^  g'*0,k*l , . . .M)  then  G(z)  has  zeros  at  eJ  K,  k=l,2,...M. 

2 

The  L+l-M  eigenvectors  of  R  (corresponding  to  the  eigenvalue  a  )  have  this  property 
[68].  If  L-M,  then  there  is  only  one  such  eigenvector.  This  case  coincides  with 
Pisarenko's  [60].  Several  researchers  [60,64-72,74,75]  have  used  this  property. 

The  difference  between  these  works  and  ours  is  that  we  realized  that  the 
above  properties  are  adaptable  for  processing  N  samples  of  a  scalar  time  series 
which  consists  of  M  exponential  signals  (damped  or  undamped).  That  is,  by  form¬ 
ing  a  matrix  A'^A'(as  in  chapter  2  or  4)  where  A'  is  a  ((N-L)x(L+l)  or  2(N-L)x(L+l) ) 
data  matrix  using  the  N  samples  of  data,  we  found  that  the  eigenvectors  of  A'  A' 
have  similar  properties  to  R  in  equation  4.44.  In  addition  we  could  choose  the 
value  of  L,  which  alters  the  order  of  A'+A'  and  the  degree  of  the  polynomial  G(z; 
to  maximize  the  accuracy  of  estimation.  Whereas,  in  this  array  problem,  the  degree 
L  of  the  polynomial  G(z)  is  dictated  by  the  number  of  sensors  and  not  by  the  ob¬ 
servation  time. 

Usually,  the  matrix  R  in  equation  4.44  is  not  known  exactly.  It  is  esti¬ 
mated  from  the  K  snapshots  observed  at  the  output  of  the  array  as 

-  K  t 
R “  E  LJL 

n-L 
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Now  we  shall  offer  some  specific  comments  on  some  eigenvector  based  methods 
which  are  closely  related  to  our  work.  We  shall  call  the  eigenvectors  of  R, 

. ^L+l'  T*ie  principal  eigenvectors  of  R,  e^,  •  •  •  e^  are  said 

to  span  the  signal  subspace  and  e^^ 
subspace. 

Pisarenko  [60]:  He  was  the  first  to  realize  the  important  properties  of 
the  eigenvectors  of  the  correlation  matrix  R  in  equation  4.44.  But  the  algorithm 
he  proposed  restricts  the  value  of  L  to  M  and  hence  the  accuracy  of  the  estimates 
obtained  using  his  method  is  poor. 

Cantoni  and  Godara  [64]:  They  allowed  for  L  greater  than  M  unlike 
Pisarenko.  The  L«M  extraneous  zeros  of  the  eigenvector  polynomials  are  elimi¬ 
nated  by  calculating  the  powers  associated  with  each  zero.  The  null  space  eigen¬ 
vectors  are  not  combined  to  form  a  single  polynomial  as  done  in  Reddi's  method 
[68]  described  below,  or  our  IP  method  (see  section  4.3) .  Hence  they  do  not  ex¬ 
ploit  the  SNR  improvement  that  stems  from  using  all  the  L+l-M  noise  subspace 
eigenvectors  or  equivalently  the  M  signal  subspace  eigenvectors. 

Owsley  [72,82]:  He  was  one  of  the  first  researchers  to  use  eigenvector 
methods  to  find  source  directions.  But  he  used  discrete  Fourier  transform  pro¬ 
cessing  on  the  M  principal  eigenvectors  of  R.  Hence  his  was  not  a  'high  reso¬ 
lution'  method  in  that  it  cannot  resolve  closely  spaced  sources.  He  realized 
this  in  recent  work  [97]. 

Liggett  [73]:  He  was  the  first  researcher  to  use  the  eigenvalues  of  R  to 
determine  the  number  of  signal  sources.  He  fitted  a  multivariate  signal  model 
to  the  observed  data  unlike  others.  His  wavefront  model  is  more  general  than 
the  plane  wavefront  model  used  by  others. 

Schmidt  [67]:  He  has  suggested  a  method  by  which  a  high  resolution  angular 
spectrum  p(w)  is  computed  using  the  L+l-M  noise  subspace  eigenvectors.  In  this 


,  e^ j 1 2 »  •••  are  8a*<*  t0  sPan  the  noise 


method  p(w)  is  computed  as 


where  E,  (e^w) 
vector  of  R. 


p(w) 


1 


L+l 

Z 

k-ttfl 


2 
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L 


Own 

z«e 

That  is, 


jw‘ 


e^.m-0,1 ,2. ♦  .L  are  the  elements  of  the  k  th  eigen- 


■%c  *  fek0’  “kl*  ”*  eklJ  4,48 

Since  each  noise  subspace  eigenvector  polynomial  will  have  zeros  at  e^^, 

k»l,2,...M  on  the  circle  |z|  *1  (if  R  was  known  exactly),  it  will  produce 
spikes  in  p(w)  at  those  angular  locations.  He  also  alludes  to  the  frequency 
estimation  problem  in  the  case  of  a  scalar  time  series,  briefly.  It  is  not 
clear  how  he  formed  the  correlation  matrix  in  this  case.  As  we  have  seen  in 
section  4.4  and  in  ref.  [54],  this  has  a  tremendous  effect  on  the  accuracy  of 
the  frequency  estimates.  His  method  is  applicable  only  to  undamped  sinewaves. 
Reddl  [68]:  Reddi's  method  is  closely  related  to  ours.  He  uses  the  M 

principal  eigenvectors  e^,  .  e^  of  R  to  determine  a  vector  j>'  which 

lies  perpendicular  to  these.  Hence  his  method  is  essentially  identical  to  our 
IP  method  (see  section  4.3).  He  fixes  the  first  element  of  £'  to  unity  and  com¬ 
putes  the  rest  of  the  elements  of  £*  using  a  pseudoinverse.  Hence  he  implicitly 

2  2  2 

uses  the  constraints  on  the  polynomial  G(z) ,  gg*l  and  |g^|  +  ^1  +183!  + 

2 

.  +  |g^|  is  minimum,  though  he  did  not  realize  it.  This  explains  his  ob¬ 
servation,  'that  the  extraneous  roots  of  the  principal  polynomial  (G(z))  rarely 
lie  on  the  unit  circle'.  This  follows  from  our  results  in  chapter  2. 

Further  comparison  of  these  eigenvector  methods  can  be  found  in  our  paper 


[98]. 


4.5.3:  OTHER  WORKS: 

The  following  papers  deal  with  a  scalar  time  series  (in  constrast  to  those 


in  the  previous  section)  and  are  closely  related  to  our  work. 


78 


Henderson  [57]:  He  computes  the  smaller  (noise  subspace)  singular  values 
and  the  corresponding  eigenvectors  of  a  data  matrix  A'.  But  instead  of  computing 
an  L  th  degree  polynomial  G(z)  as  we  do  in  section  4.3,  he  reduces  the  degree  of 
the  polynomial  to  M  by  using  a  Gaussian  elimination  procedure  on  the  L+l-M  noise 
subspace  eigenvectors  of  A'^A'.  Although  this  eliminates  the  nuisance  caused  by 
the  L-M  extraneous  zeros  of  G(z),  it  results  in  poor  parameter  estimates  [99]. 

Van  Blaricum  [35,100]:  In  ref.  [35]  he  compared  the  performance  of 
Henderson's  method  (above)  with  his  method  [100]  (which  is  essentially  Pisarenko's 
idea  of  using  the  minimum  eigenvalue-eigenvector  polynomial).  In  ref.  [35]  he 
raised  relevant  questions  such  as  how  to  combine  the  L+l-M  noise  subspace  eigen¬ 
vectors  of  A'^A'  and  how  to  eliminate  the  L-M  extraneous  zeros  of  G(z) ,  but  did 
not  provide  effective  answers.  We  provide  answers  to  some  of  his  questions  in 
section  4.3. 


Holt  and  Antill  [76]:  They  suggested  a  SVD  based  method  to  determine  the 
number  of  signal  components  in  the  data  using  Prony's  method.  They  had  essen¬ 
tially  suggested  the  same  method  we  called  the  TK  method  in  section  4.2.  How¬ 
ever,  they  missed  a  crucial  point,  in  that  they  did  not  realize  the  importance 
of  using  a  value  of  L  larger  than.M.  But  the  numerical  example  they  chose  has 
a  SNR  well  below  the  threshold  SNR  (our  numerical  calculations  show  this) .  There¬ 
fore,  even  with  an  optimum  choice  of  L  (which  they  did  not  use)  they  would  not 
have  obtained  accurate  parameter  estimates. 

Gueguen  et  al.  [101]:  In  this  paper  the  authors  present  a  sequence  of 
constrained  minimization  problems  which  includes  various  linear  prediction  methods 
and  Pisarenko  method.  They  restate  the  linear  prediction  problem  as 

£,+R'  £'  with  g«l  4.49 


Minimize 
w.r.t  £' 


If  R'*A'+A',  the  above  problem  is  identical  to  the  forward,  backward  or  forward- 


backward  linear  prediction  methods  depending  on  whether  A'  equals  A^  or  A^  or  A^, 
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respectively,  Pisarenko's  method  is  restated  as 


Minimize 
w.v.t  £' 


£,+R'  £  with  £,+g.' 


4.50 


The  vector  which  minimizes  the  above  quadratic  form  can  easily  be  shown  as 

the  eigenvector  corresponding  to  the  minimum  eigenvalue  of  R'.  With  respect 

to  the  above,  our  IP  method  (in  section  4.3)  can  be  rewritten  as 

Minimize  e'^RA  g'  with  g  "1  and  j5|G(z)|^dz  is  minimum  4.51 

w.r.t  m  U 

where  R^  is  the  rank  M  approximation  to  R'.  The  minimum  value  of  the  quadratic 
form,  of  course,  will  be  zero. 

The  actual  improvement  in  performance  of  our  methods  over  those  given 
above  stem  from  properly  choosing  the  value  of  L,  from  using  the  M  principal 
eigenvector  of  R'  (i.e.  R^  instead  of  R')  and  from  choosing  the  appropriate 
type  of  data  matrix. 
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CHAPTER  5:  SIMULATION  RESULTS: 

In  this  chapter  we  quantify  the  performance  of  different  algorithms  we 
have  proposed  and  others  by  measuring  the  bias  and  mean-square  error  of  the 
parameter  estimates  (s^)  over  500  independent  trials.  The  mean  square  errors 
of  the  estimates  are  compared  to  the  Cramer-Rao  (CR)  bound.  The  CR  bound  speci¬ 
fies  the  lower  bound  on  the  variance  of  (unbiased)  parameter  estimates.  There¬ 
fore  these  bounds  serve  as  a  goal  for  our  estimation  algorithms.  The  data  samples 
for  the  simulation  experiments  are  generated  in  a  computer  as  described  below. 

SIMULATION  1:  UNDAMPED  SINUSOIDS: 

In  this  simulation  25  data  samples  were  generated  using  the  following  formula. 

y(n+l)  ■  a^e-^^n  +  a ^e^2^211  +  w(n)  ,  n-0, 1 ,2, . . .24  5.1 

where  fj  ■  0.52  Hz,  f 2  «  0.5  Hz,  aj^  *  e^1,  4^  »  JI/4,  a2  »  e^2,  4>2  *  0.  w(n) 

is  a  computer  generated,  white,  complex  Gaussian  noise  sequence  with  variance 
2  2 

2a  .  a  is  the  variance  of  the  real  and  imaginary  part  of  w(n) .  SNR  is  defined 
2  2 

as  10  log^Q  (|a^|  /2a  ).  For  every  trial  (500  such  trials  were  used)  the  signal 
part  of  the  data  sequence  y(n)  is  fixed  but  w(n)  is  different. 

SIMULATION  2:  EXPONENTIALLY  DAMPED  SINUSOIDS: 

For  this  simulation  the  following  formula  is  used  in  data  generation. 

y(n+l)  -  aie(_al+;j2nfl)n  +  a2e(_a2+:i2nf2)n  +  w(n) ,  5.2 

n*0,l ,2. . .24 

Oj  *  0.1,  f^  ■  0.52Hz,  a^  *  e^l,  *  0 
a2  *  0.2,  f2  ■  0.42Hz,  a2  *  e^2,  ^  ■  0 


where 
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The  characteristics  of  the  noise  sequence.  w(n)  are  the  same  as  above.  The  SNR 

O 

is  defined  as  10  log  (|a^j  /2 a)  .  Note  that  this  is  the  peak  SNR. 

5.1:  DIFFERENT  METHODS  USED  IN  THIS  SIMULATION: 

MAXIMUM  LIKELIHOOD  (ML)  METHOD: 

In  this  method  the  error  E 

E-  Z  |y(n)  -  I  Le^kC-Dj2  5.3 

n*l  k“l 

AAA  A 

is  minimized  by  finding  f^,  a^  and  a ^  using  a  search  procedure  [18].  It  is 
used  here  for  estimating  frequencies  of  undamped  sinusoids  only  (simulation  1) . 

FORWARD-BACKWARD  LINEAR  PREDICTION  (FBLP)  METHOD: 

The  following  linear  equations  are  set  up  using  the  25  data  samples. 

Afb  *  3  _hfb  5‘4 

Then,  £  is  given  by 

£  *  ~<Afb  Afb)  Afb  *lfb  5,5 

The  polynomial  G(z)  ■  1+  E  g,z  is  formed  and  its  zeros  are  found.  The  two 

k-1 

roots  closest  to  the  unit  circle  are  taken  as  the  signal  zeros.  The  angles  of 
these  zeros  (2IIf ^  and  2IIf divided  by  two  pi  gives  the  frequency  estimates. 

This  method  is  applicable  to  undamped  sinusoids  only  (simulation  1). 

BACKWARD  LINEAR  PREDICTION  (BLP)  METHOD: 

This  method  is  meant  for  estimating  the  parameters  of  exponentially  damped 
sinusoidal  signals  (simulation  2) .  In  this  case  the  linear  equations  used  are 

^  £  s  -£  5'6 

The  £  vector  is  given  by 

&  “  -(aX)_14i 


5.7 


9 
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T 


X 

s 


The  polynomial  G(z)  is  formed  and  its  roots  are  calculated.  From  the  two  roots 
that  fall  outside  the  circle  |z|=l,  the  estimates  of  c^,  f^  and  and  f ^  are 
found.  A  glance  at  figure  3.5  will  help  in  understanding  this  method.  If  there 
are  more  than  two  (-M)  roots  outside  the  circle,  the  method  fails. 


MODIFIED  PRONY  (MP)  METHOD: 


In  this  method  the  linear  prediction  equations  in  equation  5.4  (for  un¬ 
damped  sinewaves;  simulation  1)  or  in  equation  5.6  (for  exponentially  damped 
sinewaves;  simulation  2)  can  be  used.  The  vector  £  is  calculated  as  before  and 
the  roots  of  G(z),  eSk,  k*l,2,...L  are  found.  Then  we  find  (M«)  2  out  of  the  L 
exponentials  e^.k-l  ,2, . .  .L  that  best  fit  the  data  in  the  sense  that  they  give 
minimum  value  for  E,  defined  below. 


24 


M 


Min 

ski 

e  ,  i«l,2,...M 


E  |y(n+l)  -  E  a,  e 
n-0  i-1  Ki 


Ski. 


5.8 


In  the  case  where  the  equation  A^sh^  is  used,  the  zeros  of  G(z)  that  are  outside 
the  circle  can  be  first  selected.  For  example,  assume  that  K  out  of  L  zeros  of 
G(z)  are  outside  the  circle.  Then  we  flip  them  inside  the  circle  (i.e.,  take 
the  mirror  image) .  Then  we  find  M*2  out  of  the  K  exponentials  which  best  fit 
the  data.  Since  K  will  be  less  than  L  the  computational  burden  is  reduced. 


TUFTS-KUMARESAN  (TK)  METHOD: 

As  before,  we  set  up  the  linear  equations  as  in  equation  5.4  (for  undamped 
sinewaves)  or  as  in  equation  5.6  (for  exponentially  damped  sinewaves).  Let  the 
following  equations  represent  either  of  these  two  sets  of  linear  equations. 

A£  s  -h  5.9 

We  compute  the  SVD  of  A.  A  *  UEV^.  We  set  the  smaller  singular  values,  q„. , 

iTT  i  f 

gjj 1 2 . . to  zero.  Then  we  compute  the  vector  £,  as 

i  ■  "  V1  <&>“k 

k*l 


5.10 
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where  u^  and  v^  are  the  columns  of  U  and  V.  Then  the  polynomial  G(z)  is  formed. 
The  rest  of  the  method  is  identical  to  FBLP  or  BLP  method. 


IMPROVED  PISARENKO  (IP)  METHOD: 

Using  the  N-25  data  samples  we  form  a  matrix  ■  (h^  A^)  t*ie  caae 

of  simulation  1  and  A£  - 
these  matrices  by  A'.  The  SVD  of  A*  is  computed.  A'  =  PAQ.  and  are  the 
columns  of  P  and  Q  respectively.  Then  the  £  vector  is  calculated  as  in  equation 
4.23.  The  rest  of  the  method  is  identical  to  the  TK  method. 

The  simulation  experiments  are  repeated  at  several  SNR  values  for  each 
method  and  they  are  tabulated  below.  The  formulas  for  CR  bounds  for  parameters 
of  undamped  sinewaves  were  obtained  by  Rife  and  Boorstyn  [19].  Ue  have  plugged 
in  our  parameter  values,  SNR,  and  N  value  in  their  formulas  to  calculate  the  CR 
bounds.  The  formulas  for  CR  bounds  for  exponentially  damped  sinewaves  are  derived 
in  appendix  C. 

5.2:  DISCUSSION: 

The  following  comments  regarding  the  simulation  experiments  are  in  order. 

SIMULATION  1:  UNDAMPED  SINEWAVES  IN  NOISE:  (TABLE  5.1) 

(1)  The  ML  method  has  the  best  performance  as  can  be  expected.  The  MSE  is 

almost  the  same  as  the  CR  bound  up  to  3db  SNR.  The  threshold  occurs  at  SNR 
value  of  about  3db.  By  threshold,  we  mean  that  the  MSE  is  significantly  different 
from  the  CR  bound  at  this  SNR  due  to  the  occurrence  of  outliers. 

(2)  The  FBLP  method  falls  at  about  25db  SNR.  This  is  because  several  of 

the  L-2  extraneous  zeros  fall  close  to  the  unit  circle  at  lower  SNR  values,  result¬ 
ing  in  several  outliers.  See  figure  4.3. 

(3)  In  the  MP  method,  although  the  zeros  obtained  by  rooting  the  polynomial 

G(z)  are  exactly  the  same  as  in  FBLP  method,  the  threshold  occurs  at  a  much  lower 


(h^  A^)  in  the  case  of  simulation  2.  Let  us  denote 
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chosen  L  values  are  best  for  that  method.  At  7db  SNR,  for  the  IP,  MP,  and  TK 
methods  the  bias  is  about  a  third  of  the  Jmean  square  error. 


SNR  because  the  signal  zeros  of  G(z)  are  picked  based  on  the  criterion  in  equation 
5.8,  unlike  as  in  the  FBLP  method.  The  value  of  MSE  at  30db  SNR  in  the  case  of  the 
HP  method  and  the  FBLP  method  are  the  same  because  the  same  zeros  of  G(z)  are 
picked  in  each  trial  in  both  methods. 

(4)  The  performance  of  the  TK  and  IP  methods  are  about  the  same  and  are 
only  slightly  poorer  than  ML  method.  Their  performance  is  slightly  better  than 
the  MP  method. 

SIMULATION  2:  EXPONENTIALLY  DAMPED  SINEWAVES  IN  NOISE:  (TABLES  5. 2-5. 5) 

(1)  The  BLP  method  did  not  give  the  correct  estimates  even  at  SNR“40db  be¬ 
cause  there  were  too  many  extraneous  zeros  outside  the  unit  circle. 

(2)  As  in  Simulation  1,  the  TK  and  IP  methods  perform  slightly  (in  some 
cases  much)  better  than  the  MP  method. 

(3)  The  bias  in  the  estimates  of  the  damping  factors  and  becomes  sig¬ 
nificant  at  lower  SNR  values.  This  bias  can  be  reduced  by  using  the  noise  sin- 

or  A^  in  the  IP  and  TK  methods  as  suggested  by  us  in  ref.  [55]. 

The  results  in  Tables  5. 1-5. 5  are  drawn  in  a  graphical  form  in  figures  5.1- 
5.5.  Further  simulation  results  can  be  found  in  our  papers  [18,50,52-56,92,98,99, 
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LUTION  TEST  CHART 

OF  STAN  DARDS  - 1 963-A 


e  5.3:  Simulation  II 
ds?  See  Table  5.3. 


•  9 


SNR  m  dB  . 

Figure  5 • 5 s  Simulation  II;  Performance  Comparison  of  Different 
Methods;  See  Table  5«5. 


CHAPTER  6:  CONCLUSIONS 


We  assumed  that  a  short  record  of  a  data  sequence  y (n) ,  composed  of  M 

(unknown)  number  of  exponentially  damped  or  undamped  sinusoidal  signals  in 

M  8kn 

noise,  i.e.,  y(n)  •  Z  a,e  +  w(n) ,  n-l,2,...N,  is  available  for  processing. 

k*l 

We  set  out  to  estimate  the  values  of  (a^}  and  {s^}  and  M.  The  goal  was  to  esti¬ 
mate  these  parameters  at  lower  SNR  values  than  possible  using  presently  available 
algorithms.  We  discovered  (or  rediscovered,  in  some  cases)  the  following  main 
results . 

(1)  Estimating  the  parameters  {s^}  from  the  zeros  of  a  polynomial  G(z) 
(i.e.,  using  the  Indirect  approach  originated  by  Prony)  is  preferable  computa¬ 
tionally  and  otherwise,  especially  when  multiple  signal  components  are  present 
in  the  data.  However,  Prony 's  idea  needs  modifications  to  accommodate  the  pres¬ 
ence  of  noise  in  the  data. 

(2)  Selecting  a  value  for  L,  the  degree  of  G(z) ,  greater  than  M  Improves 
the  estimation  accuracy  considerably.  But  this  alone  is  not  sufficient  for 
accurate  parameter  estimation.  Because  the  higher  degree  polynomial,  G(z) ,  has 
L-M  extraneous  zeros  which  results  in  outliers. 

(3)  We  suggested  two  methods  for  eliminating  the  extraneous  zeros  of  G(z) . 
This  is  Important  in  extending  the  threshold  SNR  of  the  methods  presented. 

(3a)  Firstly,  we  presented  a  selection  procedure  (in  chapter  3)  which 
picks  the  M  signal  zeros  out  of  the  L  zeros  of  G(z)  based  on  a  least  squares 
criterion.  This  procedure  extends  the  usefulness  of  Prony' s  method  to  low-SNR 
values.  It  also  gives  a  method  of  estimating  the  value  of  M. 

(3b)  Further  improvement  in  accuracy  of  the  estimates  was  achieved  by 
improving  the  SNR  in  the  data  using  SVD. 
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(4)  The  majority  of  the  improvement  In  accuracy  is  achieved  by  using  a 
value  of  L  greater  than  M,  which  intuitively  means  explaining  part  of  the  noise 

in  the  data  by  L-M  additional  exponentials,  paving  the  way  for  more  accurate  para- 
meter  estimates.  This  is  the  primary  reason  for  extension  of  the  threshold  SNR 
value  as  shown  by  the  results  of  modified  Prony  (MP)  method.  Using  SVD  of  the 
data  matrix  further  improves  the  accuracy. 

(5)  We  pointed  out  several  nice  properties  of  the  polynomial  G(z),  when 

2  2  2 

its  coefficients  are  constrained  such  that  gg*l  and  |gjJ  +  |g2|  +  .  +  |gjJ 

is  minimum.  These  were  first  used  in  ref.  [92]. 

(6)  Other  noteworthy  points  include  (a)  use  of  backward  prediction  to  iden¬ 
tify  the  signal  zeros  from  the  extraneous  zeros  (in  the  case  of  exponentially 
damped  sinusoids) ,  (b)  using  the  extreme  value  of  L  («N-M  or  N-M/2)  to  avoid  ex¬ 
pensive  SVD  calculations  [92]  (see  section  4.4.2),  but  at  the  cost  of  some  per¬ 


formance 


APPENDIX  A:  MINIMUM  PHASE  PROPERTY  OF  THE  ’PREDICTION-ERROR'  FILTER 


In  equation  2.11,  the  prediction-error  filter  polynomial  G(z)«l+  I  g,  z' 

k-1  k 

is  factored  into  two  polynomials  as  below. 


where 


G(z)  -  G1(z)G2(z) 


G.(z)«l+  I  b.z 
1  k-1  * 


G2(z)-1+  Z  c^z"*  A. 3 

k-1  8.  „8* 

The  factor  G^(z)  has  the  M  signal  zeros  located  at  e  (or  e  j ,  k-l,2,...M  as 

shown  in  section  2.1.  If  we  find  the  coefficients  of  G(z),  g^,  g2,  ...  such 
2  2  2 

that  jg^j  +  |g2|  + . +  |g^|  is  minimum  (as  in  proposition  5,  section  2.2) , 


we  wish  to  show  that  the  L-M  zeros  of  G2(z)  will  lie  within  the  unit  circle.  A 
polynomial  is  said  to  be  minimum  phase  if  all  its  zeros  are  within  the  unit 


circle. 


Note  that 


“  2 

minimizing  I  |g.  j  ,  is  the  same  as  minimizing  the  following  quan 
k-1  * 


tity,  Q, 


which  is. 


Q  ■  i+lgj2  +  I s2 1 2  + . +  I*lI 


Q  -  /|G(ejw)|2dw 

-n 

-  /|G.(ejw)|2  jG,(eJw)|2dw  A.5 

-n 

We  now  show  that  the  roots  of  G2(z)  are  inside  the  unit  circle  if  Q  is  minimized, 

following  a  proof  given  by  Lang  and  McClellan  [87]  and  Pakula  and  Ray  [88].  Let 

-1  L_M  -1 

us  factor  G,(z)  as  G,(z)  -  (1-z.z  )G'(z),  where  Gl(z)  -  II  (1-z.z  )  and  z. 

2  2  *  2  2  k-l,k*i  k  1 

is  any  zero  of  G2(z).  Let  Zj-reJ  Then 

Q(r)  -/|G.(eJw)i2  | ( l-reJ (w±”w) ) |2  |G* (eJw) | 2  dw 


«/|G.(e^W)|  (l-2rcos(w.-w)+r  )  |Gl(e^w)|  dw  A. 6 

n  1  * 

Differentiating  Q(r)  with  respect  to  r.  we  get 

■  2/|G1(eJw)|2  (2r-2cos(w1-w))  |G'(ejw)|2dw  A.7 

and  since,  for  ril,  — ^r^>0,  Q(r)  cannot  be  minimized  If  a  zero  of  *-8  on 

or  outside  the  unit  circle.  Hence,  62(2)  is  minimum  phase  and  the  L-M  extraneous 
zeros  of  G(z)  lie  within  the  unit  circle. 


APPENDIX  B:  THE  EIGENVECTORS  OF  A  A  AND  A  A 


In  this  appendix,  we  shall  show  that  the  M  principal  eigenvectors  of  A^A 

and  A  A+  span  the  'signal  subspace'.  The  term  'signal  subspace’  is  explained 

below.  Similar  properties  are  true  for  the  matrices  A'+A'  and  A'  A'+  as  well. 

M  svn 

The  noiseless  signal  sequence  y(n)  is,  y(n)  -  I  a.  e  K  .  Let  A  be  the  forward 

k-1  K 

data  matrix,  and  let  L  satisfy  the  inequality,  MsLs(N-M) . 


y(L) 

y(L-l)  .... 

•  •  y(D 

yO+U 

y(L)  .... 

..  y(2) 

[  y(N-l)  y(N-2)  .  y(N-L)  J 

After  substituting  for  y(n),  the  matrix  A  can  be  decomposed  into  two  matrices  < 
follows . 


aie3lL  a2es2L 
a.esl<L+1>  ... 


aMeSML  j  1  e‘Sl  e*2®1  . e"(L'1)sl 

sm(L+1)  1  e~®2  e“2s2  .  e“(L-l)s2 


1  e'sMe_2sM  .  e“(L“l)sM 


[a/l^15  . aMe8M(N-I)  J 

Let  us  call  the  two  matrices  on  the  right  side  of  B.2,  B  and  C.  Hence, 

A  -  B  C 

(N-L)xM  MxL 


t  t  t 

A  A  -  C  B  B  C  B.^ 

Now,  consider  the  (MxM)  matrix  B+BCC+.  It  is  positive  definite  because  both  B 


and  C  have  full  rank.  Let  us  call  its  eigenvalues  X.,  i-l,2,...M  and  its 
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T 

eigenvectors  •  (w^,  w^2»  •••  wijj)  »  i-l»2,...M.  By  definition 

B+BCC+wi  -  X±wi  ,  i-l,2,...M  B.5 

Premultiplying  the  above  equation  by  C+,  we  get 

C+B+BCC+w.  -  X.C+w.  ,  i«l,2,...M  B16 

— i  l  - l 

t  t  t  t 

Let  us  define  v^^C  w^.  But  A  A“C  B  BC.  Hence  B.6  simplifies  to 

A^Av.  ■  X.v.  i*l,2,...M  B.7 

— i  i— 1 

Therefore,  X^,  i»l,2,...M  are  also  the  non-zero  eigenvalues  of  tb  LxL)  matrix 
A+A.  The  corresponding  eigenvectors  of  A^A  are 

Wil 
W12 


WiM 


Thus  any  principal  eigenvector  of  A^A  will  be  a  linear  combination  of  the  columns 

of  C+.  Similarly  it  can  be  shown  that  any  principal  eigenvector  of  AA^  is  a 

linear  combination  of  the  columns  of  B  (and  hence  linear  combination  of  vectors, 

(1,  e8*-,  e^8*,  ...  e^  ^si)^f  .  .M) .  Any  n-L  contiguous  samples  of  the 

T 

signal  sequence  (for  example,  (y(k) ,y(k+i) ,. . . .y(N-L+k-l))  can  be  expressed  as 
a  linear  combination  of  the  M  principal  eigenvectors  of  AA^*.  Hence  these 
vectors  are  said  to  span  the  'signal  subspace'. 

Extending  the  above,  expressions  for  the  eigenvalues  and  the  signal  sub¬ 
space  eigenvectors  of  AA*  and  A^A  can  be  derived  as  shown  in  ref.  [89].  Similar 
expressions  can  also  be  derived  for  the  backward  data  matrix  and  forward- 
backward  data  matrix  A-  . 


APPENDIX  C:  CRAMER-RAO  BOUND  FORMULAS  FOR  THE  PARAMETERS  OF  EXPONENTIALLY  DAMPED 


SINUSOIDAL  SIGNALS: 


In  this  appendix  we  shall  describe  an  outline  of  the  calculation  of  the  CR 
bounds  for  the  parameters  of  one  (M-l)  and  two  (M-2)  exponentially  damped  signals 
in  white  Gaussian  noise.  At  high  SNR  values  our  parameter  estimates  are  essen¬ 
tially  unbiased.  The  CR  bounds,  in  that  case,  provide  a  lower  bound  on  the 
variance  of  the  parameter  estimates.  The  data  samples  y(n)  are  given  by  the 
formula 


M  + 

y(n)  «  I  b.e-^k  e  akn  J^nf^n  +  w(n),  n*>0,l, . .  .N-l 
k-1  K 


C.l 


where  w(n)  is  a  white  Gaussian  sequence.  The  sampling  interval  is  assumed  one 

2 

second.  The  real  and  imaginary  parts  of  w(n)  have  a  variance  o  .  Comparing 
equation  1.1  with  the  above  formula,  we  see  that  a^-b^e^k  and  s^— a^+jZIIf^, 
k-l,2,...M.  For  convenience  we  shall  relabel  the  4M  parameters  as  follows. 


S4<k-I>  "  £k 


S4(k-1)+1  *  ck 

k-1 ,2, . . .M 

C.2 

S4(k-l)+2  “  *k 

^4 (k-1) +3  “  “k 

£“(y(0) ,y(l) . . 

• 

• 

• 

• 

• 

1 

'm' 

H 

C.3 

8—  (8q»  8^»  . 

T 

- 64(M-l)+3) 

C.4 

The  probability  density  function  for  the  data  vector  conditioned  on  the  unknown 
parameter  vector  can  be  written  as 

?<£/£)  “  (2no2)"Nexp  ^  Z  |y(n)  -  Z  b  ejW“ak+j2IIfk)n|2  C.5 

2  n-0  k-1  K 

The  CR  bound  states  [93]  that  for  any  unbiased  estimate  of  6. 

Var  (f^)  k  [J"1(g>]11  C.6 

where  J(8)  is  the  (4Mx4M)  Fisher  information  matrix  with  elements  [93] 


-  "E  367367  *<X/*> 


We  shall  consider  Che  cases  M-l  and  2.  The  derlvaton  follows  Chat  of  Rife  and 
Boorstyn  [19]  who  derived  similar  expressions  for  the  case  of  sinusoidal  sig¬ 


nals. 


CASE  1 :  M-l 

Using  formula  C.7  above  we  can  wrlCe  down  the  Fisher  information  matrix  for 


this  case  as  follows 


,2 

bl  pi2 


.2 

bl  pil 


J  -  J, 


symmetric  ciPiO 


where 


-bipil 


Cipi2 


i-1  C.8 


N-l  .  -  n 

pi1  "  2  n1  e  /ain  .  j-0,1,2  C. 

J  n-0 

Inverting  J  analytically,  we  find  the  diagonal  terms  of  J  which  provide  the 
bounds  on  the  variance  for  the  parameters  f^  and  as  follows. 


Var  (f x )  Z 


41lV 


P10P12_P11 


Var  (a^)  2 


p10p12-pll 


CASE  2:  M-2 


In  this  case  J  is  an  (8x8)  matrix.  We  partition  J  as  follows 


J  T  '  J 
J12  '  J22 


where  the  matrices  J. .  and  J..  are  given  by  equation  C.8  and  C.9  with  i-1  and 


i-2  respectively.  J..  is  a  matrix  exhibiting  the  interaction  between  the 


parameters  of  the  two  signals. 

blb2r2  bl’l  blb2ri  -blb2’2 

-b2<*l  r0  2q0  -b2rl 

blb2rl  blq0  blb2r0  -blb2ql 

blb2q2  -blrl  blb2’l  blb2r2 

where 

r  .  V  cosi  ~) 

1  -  JHU.2 

N-l  ,  .  .  \ 

q.  ■  E  e~  ai  2^nnj  sinA  j  C.l 

3  on"° 

An  -  2n(f2  -  f j)n  +  02  -  0X 

The  matrix  J  was  inverted  using  a  machine  to  compute  the  CR  bounds  of  the  para 
meters  f ^ ,  f2>  c^,  and  a2  which  are  used  in  chapter  5.  Interestingly,  due  to 
the  special  structure  of  J,  it  turns  out  that  the  diagonal  elements  of  J-1  are 
Independent  of  <)>,-$, . 
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